## Pandas

### Data Preparation 
### Data Manipulation 
### Data Modelling 
### Data Analysis 


# Notebook Contents

#####  <span style="color:green">1. Why are we studying series? </span>
##### <span style="color:green">2. Series data structure</span>
#####  <span style="color:green">3. Methods or functions</span>
##### <span style="color:green">4. pandas.Series.apply()</span>

We are studying series as it is a predecessor of learning data frames as series of data is DaraFrames. 

In [2]:
import pandas as pd
import numpy as np

my_series = pd.Series([10,20,44,52,22,36])

print(my_series)
type(my_series)

0    10
1    20
2    44
3    52
4    22
5    36
dtype: int64


pandas.core.series.Series

In [3]:
my_series = pd.Series([10.2,20.6,44,52,22,36])

print(my_series)
type(my_series)

0    10.2
1    20.6
2    44.0
3    52.0
4    22.0
5    36.0
dtype: float64


pandas.core.series.Series

You can see that it returns an indexed column and the data type of that column which is 'int' in this case.

Series is capable of holding any data type. For example: integers, float, strings and so on. A series can contain multiple data types too.

In [4]:
my_series = pd.Series(['Ram','algo'])

print(my_series)
type(my_series)

0     Ram
1    algo
dtype: object


pandas.core.series.Series

Every time when a srting is called object is created

In [5]:
# Defining series objects with individual indices 

countries = ['India', 'USA', 'Japan', 'Russia', 'China']
leaders = ['Narendra Modi', 'Donald Trump', 'Shinzo Abe', 'Vladimir Putin', 'Xi Jinpin']
"""Index is defined over here """
S = pd.Series (leaders, countries,dtype = object) # Index is explicitly defined here 
S


India      Narendra Modi
USA         Donald Trump
Japan         Shinzo Abe
Russia    Vladimir Putin
China          Xi Jinpin
dtype: object

In [6]:
# Have a look at the series S1

stocks_set1 = ['Alphabet', 'IBM', 'Tesla', 'Infosys']

# Here, we are inserting data as a list in series constructor, but the argument of its index is passed as a pre-defined list

S1 = pd.Series([100, 250, 300, 500], index = stocks_set1)
print (S1)
print ("\n")

# Now, have a look at the series S2

stocks_set2 = ['Alphabet', 'IBM', 'Tesla', 'Infosys']

# Here, we are inserting data as a list in series constructor, but the argument of its index is passed as a pre-defined list

S2 = pd.Series([500, 400, 110, 700], index = stocks_set2)

print (S2)
print ("\n")

# We will add Series S1 and S2

print (S1 + S2)

Alphabet    100
IBM         250
Tesla       300
Infosys     500
dtype: int64


Alphabet    500
IBM         400
Tesla       110
Infosys     700
dtype: int64


Alphabet     600
IBM          650
Tesla        410
Infosys     1200
dtype: int64


# Adding lists that have different indexes  will create 'NaN' values

1) <span style = color:Red>  "if index is alphabatetical it is sorting" <span>
    
    **NAN is NOT A NUMBER**

In [7]:
# Adding lists that have different indexes  will create 'NaN' values

stocks_set1 = ['Alphabet', 'IBM', 'Tesla', 'Infosys']
stocks_set2 = ['Alphabet', 'Facebook', 'Tesla', 'Infosys']

S3 = pd.Series([100, 250, 300, 500], index = stocks_set1)
S4 = pd.Series([500, 700, 110, 700], index = stocks_set2)


print (S3)
print("\n")

print (S4)
print("\n")

print(S3+S4)

Alphabet    100
IBM         250
Tesla       300
Infosys     500
dtype: int64


Alphabet    500
Facebook    700
Tesla       110
Infosys     700
dtype: int64


Alphabet     600.0
Facebook       NaN
IBM            NaN
Infosys     1200.0
Tesla        410.0
dtype: float64


## Methods or functions

We will have a look at a few important methods or functions that can be applied on Series.

Series.index

In [32]:
My_Series = pd.Series ([10,20,30,40,50]) 

print (My_Series.index)
print(S3.index)


RangeIndex(start=0, stop=5, step=1)
Index(['Alphabet', 'IBM', 'Tesla', 'Infosys'], dtype='object')


In [35]:
print(S3.values)

S3.head()

[100 250 300 500]


Alphabet    100
IBM         250
Tesla       300
Infosys     500
dtype: int64

In [36]:
# Returns whether the values are null or not. If it is 'True' then the value for that index is a 'NaN value

(S3 + S4).isnull()

Alphabet    False
Facebook     True
IBM          True
Infosys     False
Tesla       False
dtype: bool

## DATA Manipulation 

- Using Drop and fill functions to avoid NAN in data 

In [43]:
S5 = S3+S4

S5.head()


Alphabet     600.0
Facebook       NaN
IBM            NaN
Infosys     1200.0
Tesla        410.0
dtype: float64

In [50]:
S5.dropna()

Alphabet     600.0
Infosys     1200.0
Tesla        410.0
dtype: float64

In [51]:
S5.head()

Alphabet     600.0
Facebook       NaN
IBM            NaN
Infosys     1200.0
Tesla        410.0
dtype: float64

In [55]:
S5.fillna(method = "ffill")

Alphabet     600.0
Facebook     600.0
IBM          600.0
Infosys     1200.0
Tesla        410.0
dtype: float64

Signature:
S5.fillna(
    value=None,
    method=None,
    axis=None,
    inplace=False,
    limit=None,
    downcast=None,
) -> Union[ForwardRef('Series'), NoneType]

value : scalar, dict, Series, or DataFrame
    Value to use to fill holes (e.g. 0), alternately a
    dict/Series/DataFrame of values specifying which value to use for
    each index (for a Series) or column (for a DataFrame).  Values not
    in the dict/Series/DataFrame will not be filled. This value cannot
    be a list.
method : {'backfill', 'bfill', 'pad', 'ffill', None}, default None
    Method to use for filling holes in reindexed Series
    pad / ffill: propagate last valid observation forward to next valid
    backfill / bfill: use next valid observation to fill gap.

## pandas.Series.apply()

If at all one wants to 'apply' any functions on a particular series, e.g. one wants to 'sine' of each value in the series, then it is possible in pandas.
<br>
<b>Series.apply (func)</b>
<br>
func = A Python function that will be applied to every single value of the series    

In [56]:
my_series.head()

0    10
1    20
2    44
3    52
4    22
dtype: int64

In [62]:
my_series.apply(np.sin)

0   -0.544021
1    0.912945
2    0.017702
3    0.986628
4   -0.008851
5   -0.991779
dtype: float64