# <div class="alert alert-success" >(1) Pandas Basics 
    
Pandas is a high-performance python library that is used to analyze data. Pandas library has functions for analyzing, cleaning, exploring and manipulating data. "Pandas" name has a refrence to both 'Panel Data' ans 'Python Data Analysis' and was created by Wes Mc Kinney in 2008.    
    
#### Why use Pandas?

- It allows us to analyze big data and make conclusions based on statistical theories.
    
- Clean messy dataset and make them relevant and readable. Relevant data is very important in <mark>Data Science</mark> (DS).
    
- Provides a comprehensive set of data structures for manipulating tabular data.
    
- Provides high-performance indexing, automatic alignment, reshaping, grouping, joining, and statistical analyses capabilities.



## <div class= "alert alert-info">Pandas Data Structures
    
The two primary data structures in pandas are:
- Series objects
- DataFrame objects 

## <div class= "alert alert-info"> The Series Object

A Series represents a 1D labeled indexed array based on the NumPy ndarray. A pandas Series deviates from NumPy arrays by adding an associated set of labels that are used to index and efficiently access the elements of the array by the label values instead of just by the integer position.

Pandas Series is nothing but a column in an excel sheet
 
<b> Labels:</b>
    
label is a key feature of pandas Series and adds significant power for accessing the elements of the Series over a NumPy array. A Series always has label even if one is not specified. In this default case, pandas will create labels that consists of sequential integers starting from zero. This default behavior will make a Series initially appear to be very similar to a NumPy array. This is by design, as a Series was derived from a NumPy array. This allowed a Series to be used by existing NumPy array code that used integer-based position lookup. 
    
Even though a Series with a default integer index labels will appear identical to a NumPy array, access to elements is not by integer position but using values in the labels. The pandas library will use the provided labels to perform a lookup of values for those labels. Unlike an array, index labels do not need to be integers, they can have repeated labels, can have hierarchical sets of labels, and are integrally utilized in a pandas concept, known as automatic alignment of values by index label. 
    
<b> Index: </b>
The axis labels are collectively called index.
    
<b> Automatic alignment of values:</b>
    
Automatic alignment is arguably the most significant change that a Series makes over ndarray. Operations (+,-, /,* etc) applied across multiple pandas objects are not blindly applied to the values in order by position in the Series. The pandas library will first align the two pandas objects by the index labels and then apply the operation values with aligned labels. This is in a way, a simple type of join and allows you to associate data with common index labels without any effort.
    
Pandas provides various specializations of indexes for different data types with each being highly optimized for that specific type of data, be it integers, floats, strings, datetime objects, or any type of <mark>hashable pandas object</mark>. Additionally, a Series can be reindexed into other types of indexes, effectively providing different views into the Series object using different indexes.
    
This ability to dynamically construct alternative views on data using ad hoc indexes establishes an environment for interactive data manipulation, where data can stay in a single structure but can be easily morphed into different views. This facilitates
creating a very interactive environment to play with information and intuitively discovering meaning without having to be overburdened by its structure, such as with relational tools such as SQL.
    
<b> Datapoints: </b>
    Rows in a series are called datapoints

### <div class= "alert alert-danger">Creating a Series
    
In the real world, a Pandas Series will be created by loading the datasets from existing storage, storage can be SQL Database, CSV file, and Excel file. A Series can be created and initialized by passing either a scalar value, a NumPy ndarray, a Python list, or a Python Dict as the data parameter of the Series constructor ( pd.<span style="color:red"> series(</span> parameter <span style="color:red"> )</span> ). This is the default parameter and does not need to be specified if it is the first item. Series can be created in different ways, here are some ways by which we create a series:    
    
   

In [112]:
import pandas as pd   # To creat a series we first need to imort Pandas library 

#### (a). Creating a series from a scalar


In [113]:
scalar = 55  
a_series= pd.Series(scalar)
a_series

0    55
dtype: int64

<Mark> Note the output when the series 'd_series' is printed. Two integers are displayed. The 0
value is the index label of the single item in the Series whose value is 55.</mark>

In [195]:
type(a_series)

pandas.core.series.Series

#### (b). Creating a series from a list


In [114]:
lst = [1 ,2 , 3, 4]
b_series = pd.Series(lst)
b_series

0    1
1    2
2    3
3    4
dtype: int64

<div class= "alert alert-warning"> When creating a Series object with a scalar and specifying an index with multiple labels, pandas will copy the scalar value to associate with each index label. The following code demonstrates this by creating a Series with a scalar value and an index based on an already existing index:

In [115]:
s4 = pd.Series(2, index=b_series.index) # series with constant 2 in all its indexs  
s4

0    2
1    2
2    2
3    2
dtype: int64

index=b_series.index:index labels of s4 = index labels of b_series

An exception is raised when index in both series are not equal

In [None]:
sda = pd.Series([1,2,3], index = b_series)  # will give erroe

#### (c). Creating a series from a dictionary


In [116]:
calories = {"D1": 420, "D2": 380, "D3": 390}    
c_series= pd.Series(calories)
c_series

D1    420
D2    380
D3    390
dtype: int64

<mark> dictionary keys became series  labels

#### (d). Creating a series from array


In [117]:
import numpy as np

In [118]:
arr = np.array(['a','r','r','a','y']) 
d_series = pd.Series(arr)
d_series

0    a
1    r
2    r
3    a
4    y
dtype: object

<mark> In dtype: object, object means String </mark>

In [119]:
d_series.index    # the index (labels) of the series can be retrieved with the .index property:

RangeIndex(start=0, stop=5, step=1)

In [120]:
d_series.values   # The array of values in the Series can be retrieved using the .values property

array(['a', 'r', 'r', 'a', 'y'], dtype=object)

### <div class= "alert alert-danger"> Naming a Series & its index

In [121]:
s_d = pd.Series([1,2,3,4,4],name = 'hayyyy' ) # naming a series 
s_d

0    1
1    2
2    3
3    4
4    4
Name: hayyyy, dtype: int64

In [122]:
ind = pd.Index(['a','b','c','d','e'], name= "Ii")  #index name
s_d = pd.Series([1,2,3,4,4],name = 'hayyyy',index = ind )            # naming a series 

s_d

Ii
a    1
b    2
c    3
d    4
e    4
Name: hayyyy, dtype: int64

### <div class= "alert alert-danger"> Accessing values of a series (looking up values in a series)
    
There are two ways through which we can access element of series, they are :

- Accessing element from series with position
- Accessing element using label (index)    

#### (a). Accessing Element from Series with Position

To access the series element, refers to the index number. Use the index operator [ ] to access an element in a series. The index must be an integer. In order to access multiple elements from a series, we use Slice operation.

(the index number of a series with default index labels are same i.e both starts from 0 - n)

In [123]:
d_series

0    a
1    r
2    r
3    a
4    y
dtype: object

In [124]:
d_series[0]

'a'

In [125]:
d_series[:3]

0    a
1    r
2    r
dtype: object

In [126]:
d_series[::2]

0    a
2    r
4    y
dtype: object

In [127]:
d_series[3]

'a'

#### (b). Accessing Element using labels

To access an element from series using labels, we have to set values of default index labels i.e customize labels first to notic the difference. A Series is like a fixed-size dictionary in that you can get and set values by index label.

##### <div class= "alert alert-success"> Customizing labels
    
Default index whan a series is created are integers (0-n). to create user defined index (labels) that are meaningful to us we can use the <span style="color:red"> index</span> a parameter in the series() constructor    

In [128]:
lst = [1, 7, 2, "apple", True, "hello" ]
myvar = pd.Series(lst, index = ["a", "b", "c", "x", "y", "z"])  # index argument, custom labels
myvar

a        1
b        7
c        2
x    apple
y     True
z    hello
dtype: object

In [129]:
myvar.values

array([1, 7, 2, 'apple', True, 'hello'], dtype=object)

In [130]:
myvar.index

Index(['a', 'b', 'c', 'x', 'y', 'z'], dtype='object')

<mark>The type of items in the index that are created are now of type object </mark>

pandas will create different index types based on the type of data identified in the index parameter. These different index types are optimized to perform indexing operations for that specific data type. To specify the index at the time of creation
of the Series, use the index parameter of the constructor.

In [131]:
myvar['a']    # is lable is a string/ character use " "

1

In [132]:
myvar['y']     

True

In [133]:
myvar['a':'y']   # in integral indexing last element not included where as in lable intexing last element is included

a        1
b        7
c        2
x    apple
y     True
dtype: object

In [134]:
myvar['a'::2]

a       1
c       2
y    True
dtype: object

In [135]:
myvar[:'d']

a    1
b    7
c    2
dtype: object

To retrieve multiple items, you can pass a list of index labels via the [] operator. Instead of a single value, the result will be a new Series with both index labels and values, and data copied from the original Series.

In [136]:
myvar[['a', 'c']]     # picking multiple items. 

a    1
c    2
dtype: object

<div class= "alert alert-warning">Even though we have changed index labels, the default index i.e the integer labels are still there and we can still access elements of a series via them

In [137]:
print(myvar['a'])
print(myvar[0])

print(myvar['x'])
print(myvar[3])


1
1
apple
apple


<div class= "alert alert-warning">Items can be removed from a Series using the del() function and passing the index label(s) to be removed:

In [138]:
del(d_series[0])
d_series

1    r
2    r
3    a
4    y
dtype: object

In [139]:
del(myvar['a'])
myvar

b        7
c        2
x    apple
y     True
z    hello
dtype: object

<div class= "alert alert-success">To alleviate the potential confusion in determining label-based lookup versus position-based lookup, index label based lookup can be enforced using the .loc[] accessor:
    

- loc[]:  force lookup  by index label  of the  values
    
- iloc[]: forced lookup  by location / position of the values
    
If a location/position passed to .iloc[] in a list is out of bounds, an exception will be thrown. This is different than with .loc[], which if passed a label that does not exist, will return NaN as the value for that label    

In [140]:

s5 = pd.Series([1, 'a', 3], index=[10,'index label',12])
s5

10             1
index label    a
12             3
dtype: object

In [141]:
s5.loc['index label']    # force lookup by index label

'a'

In [142]:
s5.iloc[1]    # forced lookup by location / position

'a'

In [143]:
s5.loc[[12, 10]]   # multiple items by label (loc)

12    3
10    1
dtype: object

In [144]:
s5.iloc[[0, 2]]   # multiple items by location / position (iloc)

10    1
12    3
dtype: object

In [145]:
myvar[:'d'] = "replaced"
myvar

b    replaced
c    replaced
x       apple
y        True
z       hello
dtype: object

In [146]:
myvar['d'] = np.nan      # adding new value (NaN) and index to a series
myvar

b    replaced
c    replaced
x       apple
y        True
z       hello
d         NaN
dtype: object

In [147]:
s_dummy = pd.Series([1,2,3,4,5,6,7,8,9,10])
s_dummy.loc[0::2]   # we can do slicing with loc[] too

0    1
2    3
4    5
6    7
8    9
dtype: int64

In [148]:
s_dummy.iloc[:6:2]   # we can do slicing with iloc[] too

0    1
2    3
4    5
dtype: int64

### <div class= "alert alert-danger"> Automatic alignment

A fundamental difference between a NumPy ndarray and a pandas Series is the ability of a Series to automatically align data from another Series based on label values before performing an operation.    

In [149]:
sa1 = pd.Series([1, 2, 3, 4], index=['a', 'b', 'c', 'd'])
sa2 = pd.Series([4, 3, 2, 1], index=['d', 'c', 'b', 'a'])
print("sa1:\n")
print(sa1)
print("sa2:\n")
print(sa2)

sa1:

a    1
b    2
c    3
d    4
dtype: int64
sa2:

d    4
c    3
b    2
a    1
dtype: int64


In [150]:
sa1 + sa2

a    2
b    4
c    6
d    8
dtype: int64

The process of adding two Series objects differs from the process of addition of arrays as it first aligns data based on index label values instead of simply applying the operation to elements in the same position. This becomes significantly powerful
when using pandas Series to combine data based on labels instead of having to first order the data manually.

Also worth noting is the order of the items in the index resulting from the addition. The two Series in the addition had the same labels but were ordered differently. The index in the result is arranged in ascending order.

### <div class= "alert alert-danger"> Arithematic operations

Arithmetic operations (+, -, /, *, and so on) can be applied either to a Series or between two Series objects. When applied to a single Series, the operation is applied to all of the values in that Series (Vectorize operation). The following code demonstrates arithmetic operations applied to a Series object by multiplying the values in
b_series by 2.

#### <ins> Multiplying a series by a number series</ins> 

In [151]:
b_series *2

0    2
1    4
2    6
3    8
dtype: int64

In [152]:
s = pd.Series([1, 2, 3], index=['a', 'b', 'c'])

# scalar series using s3's index

t = pd.Series(2, s.index)
s * t               

a    2
b    4
c    6
dtype: int64

To reinforce the point that alignment is being performed when applying arithmetic operations across two Series objects, look at the following two Series as examples:

####  <ins> Adding two series</ins> 

<b> 1. No matching labels: </b>

In [153]:
s1 = pd.Series([1,2,3,4], index = ['one','two','three','four'])
s2 = pd.Series([22,33,44,55], index = ['apples','oranges','bananas','gauvas'])
print("s1: \n")
print(s1)
print(" \n")
print("s2: \n")
print(s2)

s1: 

one      1
two      2
three    3
four     4
dtype: int64
 

s2: 

apples     22
oranges    33
bananas    44
gauvas     55
dtype: int64


In [154]:
s = s1 + s2  # datatype of NaN i shown fload because Python default data type is float
s

apples    NaN
bananas   NaN
four      NaN
gauvas    NaN
one       NaN
oranges   NaN
three     NaN
two       NaN
dtype: float64

The NaN value is, by default the result of any pandas arithmetic operation where an index label does not align with the other Series.

<div class= "alert alert-warning"> NaN: not a number

Two series are added only when their labels are same if not they gives us Nan. to understant why Nan we must cover "Automatic alignment of values" described in the figure below:
    
![alignmeny.png](attachment:alignmeny.png)    
 
    

- NaN + NaN = NaN
    
- NaN + number = NaN    
    
- number + NaN = NaN    
    
- Alignment works even when labels are not equal.

- Alignment is not done for a label that present in both series 
    
- Alignment is the reason why we use series and dataframes for data analysis and not arrays

The matching of labels and returning NaN where there are no matches is essential to how pandas operates as compared to arrays in NumPy. The tasks performed with pandas using Series (and DataFrame) objects are often such that multiple sets of data need to be aligned, and if there are no matching labels during alignment, then the operation should not fail. Hence, pandas returns NaN in those situations. 

This is actually common as datasets used in various statistical, financial, and data science domains often are incomplete, and more graceful techniques are required than to throw exceptions. pandas makes the assumption to return NaN in these cases. To facilitate handling of the NaN values in data and as the result of alignment, pandas changes the way that operations handle NaN by default.

<b> 2. some matching labels: </b>

In [155]:
s3 = pd.Series([1,2,3,4,5], index = ['apples','one','two','three','four'])
s3

apples    1
one       2
two       3
three     4
four      5
dtype: int64

In [156]:
s2 + s3          # i think index after s2+s3 are arranged in alphabetic order when some index matches 

#reset index function can help us reset indecis acording to our ease 

apples     23.0
bananas     NaN
four        NaN
gauvas      NaN
one         NaN
oranges     NaN
three       NaN
two         NaN
dtype: float64

<b> 2. repeated labels: </b>

The last example of alignment during arithmetical operations demonstrates the situation where the two Series objects have duplicate index labels. The following two Series objects each have two 'a' labels:

In [157]:
s4 = pd.Series([1.0, 2.0, 3.0], index=['a', 'a', 'b'])
s5 = pd.Series([4.0, 5.0, 6.0], index=['a', 'a', 'c'])
print("s4: \n")
print(s4)
print(" \n")
print("s5: \n")
print(s5)

s4: 

a    1.0
a    2.0
b    3.0
dtype: float64
 

s5: 

a    4.0
a    5.0
c    6.0
dtype: float64


In [158]:
s4 + s5

a    5.0
a    6.0
a    6.0
a    7.0
b    NaN
c    NaN
dtype: float64

The reason for this is that during alignment, pandas actually performs a Cartesian product of the sets of all unique index labels in both Series objects, and then applies the specified operation on all items in the products. To explain why there are four 'a' index values, s4 contains two 'a' labels, and s5 also contains two 'a' labels. Every combination of 'a' labels in each will be calculated, resulting in four 'a' labels. There is one 'b' label from s4 and one 'c' label from s5. Since there is no
matching label for either in the other Series object, they only result in a single row in the resulting Series object. Each combination of values for 'a' in both Series are computed, resulting in the four values: 1+4, 1+5, 2+4 and 2+5.

So, remember that an index can have duplicate labels, and during alignment this will
result in a number of index labels equivalent to the products of the number of the
labels in each Series.

### <div class= "alert alert-danger"> Size, shape, uniqueness, and counts of values in a series

In [159]:
s = pd.Series([0, 1, 1, 2, 3, 4, 5, 6, 7, np.nan])
s

0    0.0
1    1.0
2    1.0
3    2.0
4    3.0
5    4.0
6    5.0
7    6.0
8    7.0
9    NaN
dtype: float64

In [160]:
len(s)             # length of series

10

In [161]:
s.size           # we can also use size to find length of series

# .size is also the # of items in the Series

10

In [162]:
s.shape         #.shape is a tuple with one value

(10,)

The number of the values that are not part of the NaN can be found by using the .count() method:

In [163]:
s.count()      # count() returns the number of non-NaN values

9

To determine all of the unique values in a Series, pandas provides the .unique() method:

In [164]:
s.unique()

array([ 0.,  1.,  2.,  3.,  4.,  5.,  6.,  7., nan])

In [165]:
s.value_counts()     # count of non-NaN values, returned max to min order

1.0    2
0.0    1
2.0    1
3.0    1
4.0    1
5.0    1
6.0    1
7.0    1
dtype: int64

Also, the count of each of the unique items in a Series can be obtained using .value_counts():

### <div class= "alert alert-danger">Peeking at data (heads, tails, and take)

pandas provides the .head() and .tail() methods to examine just the first few, or last, records in a Series. By default, these return the first or last five rows,
respectively, but you can use the n parameter or just pass an integer to specify the number of rows:

In [166]:
s.head()   # first five

0    0.0
1    1.0
2    1.0
3    2.0
4    3.0
dtype: float64

In [167]:
# first three
s.head(n = 3) # s.head(3) is equivalent

0    0.0
1    1.0
2    1.0
dtype: float64

In [168]:
# last five
s.tail()

5    4.0
6    5.0
7    6.0
8    7.0
9    NaN
dtype: float64

In [169]:
# last 3
s.tail(n = 3) # equivalent to s.tail(3)

7    6.0
8    7.0
9    NaN
dtype: float64

The .take() method will return the rows in a series that correspond to the zero-based positions specified in a list:

In [170]:
# only take specific items
s.take([0, 3, 9])

0    0.0
3    2.0
9    NaN
dtype: float64

### <div class= "alert alert-danger"> Boolean selection
    
Items in a Series can be selected, based on the value instead of index labels, via the utilization of a Boolean selection. A Boolean selection applies a logical expression to the values of the Series and returns a new Series of Boolean values representing the result for each value. 
    
The following code demonstrates identifying items in a Series where the values are greater than 5    

In [171]:
sb = pd.Series(np.arange(0, 10))
sb

0    0
1    1
2    2
3    3
4    4
5    5
6    6
7    7
8    8
9    9
dtype: int32

In [172]:
sb > 5                             # which rows have values that are > 5?

0    False
1    False
2    False
3    False
4    False
5    False
6     True
7     True
8     True
9     True
dtype: bool

To obtain the rows in the Series where the logical expression is True, simply pass the result of the Boolean expression to the [ ] operator of the Series. The result will be a new Series with a copy of index and value for the selected rows:

In [173]:
sb[sb > 5]    # select rows where values are > 5 

6    6
7    7
8    8
9    9
dtype: int32

pandas performs this Boolean selection by overloading the Series object's [ ] operator so that when passed a Series object consisting of boolean values it knows to return only the values in the outer Series (in this cases sb) where the labels in the
Series object are passed to a [ ] operator have True values. This turns out to be very valuable and efficient in expressing
many types of data analysis algorithms, and very convenient for extracting subsets of data based on its contents.

Unfortunately, multiple logical operators cannot be used in a normal Python syntax. As an example,<b> sb[sb > 5 and sb < 8] </b>
    causes an exception to be thrown

There are technical reasons for why the preceding code does not work. The solution
is to express the equation differently, putting parentheses around each of the logical
conditions and using different operators for and/or ('|' and '&').    

In [174]:
sb[(sb > 5) & (sb < 8)]

6    6
7    7
dtype: int32

It is possible to determine whether all the values in a Series match a given expression using the .all() method. The following asks if all elements in the series are greater than or equal to 0:

In [175]:
(sb >= 0).all() # are all items >= 0?

True

The .any() method returns True if any values satisfy the expressions. The following asks if any elements are less than 2:

In [176]:
s[s < 2].any()  # any items < 2?

True

There is something important going on here that is worth mentioning. The result of these logical expressions is a Boolean selection, a Series of True and False values. The .sum() method of a Series, when given a series of Boolean values, will treat
True as 1 and False as 0. The following demonstrates using this to determine the number of items in a Series that satisfy a given expression:

In [177]:
(sb < 2).sum()     # how many values < 2?

2

### <div class= "alert alert-danger"> Reindexing a series

Reindexing in pandas is a process that makes the data in a Series or DataFrame match a given set of labels. This is core to the functionality of pandas as it enables label alignment across multiple objects, which may originally have different indexing schemes.

This process of performing a reindex includes the following steps:
1. Reordering existing data to match a set of labels.
2. Inserting NaN markers where no data exists for a label.
3. Possibly, filling missing data for a label using some type of logic (defaulting to adding NaN values).

The following Series has an index with numerical values, and the index is modified to be alphabetic by simply assigning a list of characters to the .index property. This makes the values accessible via the character labels in the new index:

In [178]:
s = pd.Series(np.random.randn(5))           # generating random numbers between 0 - 5
s

0    1.212112
1   -0.173215
2    0.119209
3   -1.044236
4   -0.861849
dtype: float64

In [179]:
s.index = ['a', 'b', 'c', 'd', 'e']
s

a    1.212112
b   -0.173215
c    0.119209
d   -1.044236
e   -0.861849
dtype: float64

let's examine a slightly more practical example. The following code concatenates two Series objects resulting in duplicate index labels, which may not be desired in the resulting Series:

In [180]:
# concat copies index values verbatim,
# potentially making duplicates
np.random.seed(123456)
s1 = pd.Series(np.random.randn(3))
s2 = pd.Series(np.random.randn(3))
combined = pd.concat([s1, s2])
combined

0    0.469112
1   -0.282863
2   -1.509059
0   -1.135632
1    1.212112
2   -0.173215
dtype: float64

To fix this, the following creates a new index for the concatenated result which has sequential and distinct values.

In [181]:
combined.index = np.arange(0, len(combined))
combined

0    0.469112
1   -0.282863
2   -1.509059
3   -1.135632
4    1.212112
5   -0.173215
dtype: float64

Greater flexibility in creating a new index is provided using the .reindex() method. An example of the flexibility of .reindex() over assigning the .index property directly is that the list provided to <span style = "color:red">.reindex()</span> can be of a different length than the number of rows in the Series:

In [182]:
np.random.seed(123456)
s1 = pd.Series(np.random.randn(4), ['a', 'b', 'c', 'd'])
# reindex with different number of labels
# results in dropped rows and/or NaN's
s2 = s1.reindex(['a', 'c', 'g'])
s2

a    0.469112
c   -1.509059
g         NaN
dtype: float64

There are several things here that are important to point out about .reindex():
1. The result of a .reindex() method is a new Series. This new Series has an index with labels that are provided as the parameter to .reindex(). 
2. For each item in the given parameter list, if the original Series contains that label, then the value is assigned to that label. If the label does not exist in the original Series, pandas assigns a NaN value.
3. Rows in the Series without a label specified in the parameter of .reindex() is not included in the result.

To demonstrate that the result of .reindex() is a new Series object, changing a value in s2 does not change the values in s1:

In [183]:
# s2 is a different Series than s1
s2['a'] = 0
s2

a    0.000000
c   -1.509059
g         NaN
dtype: float64

In [184]:
# this did not modify s1
s1

a    0.469112
b   -0.282863
c   -1.509059
d   -1.135632
dtype: float64

Reindexing is also useful when you want to align two Series to perform an operation on matching elements from each series; however, for some reason, the two Series had index labels that will not initially align. 

The following example demonstrates this, where the first Series has indexes as sequential integers, but the second has a string representation of what would be
the same values:

In [185]:
# different types for the same values of labels
# causes big trouble
s1 = pd.Series([0, 1, 2], index=[0, 1, 2])
s2 = pd.Series([3, 4, 5], index=['0', '1', '2'])
s1 + s2

0   NaN
1   NaN
2   NaN
0   NaN
1   NaN
2   NaN
dtype: float64

This is almost a catastrophic failure in accomplishing the desired result, and exemplifies a scenario where data may have been retrieved from two different systems that used different representations for the index labels. The reasons why this happens in pandas are as follows:

1. pandas first tries to align by the indexes and finds no matches, so it copies the index labels from the first series and tries to append the indexes from the second series.

2. However, since they are a different type, it defaults back to a zero-based integer sequence that results in duplicate values. 

3. Finally, all values are NaN because the operation tries to add the item in the first series with the integer label 0, which has a value of 0, but can't find the item in the other series and therefore, the result is NaN (and this fails six times in this case).

Once this situation is identified, it becomes a fairly trivial situation to fix by reindexing the second series:

In [186]:
# reindex by casting the label types
# and we will get the desired result
s2.index = s2.index.values.astype(int)
s1 + s2

0    3
1    5
2    7
dtype: int64

The default action of inserting NaN as a missing value during reindexing can be changed by using the <span style = "color:green"> fill_value </span>parameter of the method.
The following example demonstrates using 0 instead of NaN:

In [187]:
# fill with 0 instead of NaN
s2 = s.copy()
s.reindex(['a', 'f'], fill_value=0)

a    1.212112
f    0.000000
dtype: float64

<div class= "alert alert-warning">  Reset_index()

In [191]:
np.random.seed(123456)
s1 = pd.Series(np.random.randn(3))
s2 = pd.Series(np.random.randn(3))
combined = pd.concat([s1, s2])
combined

0    0.469112
1   -0.282863
2   -1.509059
0   -1.135632
1    1.212112
2   -0.173215
dtype: float64

In [193]:
combined.reset_index()

Unnamed: 0,index,0
0,0,0.469112
1,1,-0.282863
2,2,-1.509059
3,0,-1.135632
4,1,1.212112
5,2,-0.173215


we have both indecis to drop the old index use drop=True parameter:

In [194]:
combined.reset_index(drop=True)

0    0.469112
1   -0.282863
2   -1.509059
3   -1.135632
4    1.212112
5   -0.173215
dtype: float64

In [196]:
newCom = combined.copy()   # copy by value
oldCom  = newCom   # copy by refernce 

#### <div class= "alert alert-info"> Definations

- Data science: 

branch of computer science where one study how to store, use and analyze data for deriving information from it.
- hashable: 

Hashing is a concept in computer science which is used to create high performance, pseudo random access data structures where large amount of data is to be stored and accessed quickly. For example, if you have 10,000 phone numbers, and you want to store them in an array (which is a sequential data structure that stores data in contiguous memory locations, and provides random access), but you might not have the required amount of contiguous memory locations.
So, you can instead use an array of size 100, and use a hash function to map a set of values to same indices, and these values can be stored in a linked list. This provides a performance similar to an array.
Now, a hash function can be as simple as dividing the number with the size of the array and taking the remainder as the index.
Anything that is not mutable (mutable means, likely to change) can be hashed