![](logo.png)

# Day Objectives
## Pandas
- Pandas is a built in library using for data analysis. You'll be using Pandas heavily for data manipulation, visualisation, building machine learning models, etc.
- Pandas implements a number of powerful data operations familiar to users of both database frameworks and spreadsheet programs.
- There are two main data structures in Pandas - Series and Dataframes. The default way to store data is dataframes, and thus manipulating dataframes quickly is probably the most important skill set for data analysis.
- Source: https://pandas.pydata.org/pandas-docs/stable/overview.html
## Pandas Series
- A series is similar to a 1-D numpy array, and contains values of the same type (numeric, character, datetime  etc.). A dataframe is simply a table where each column is a pandas series.

## creating series
- List
- Tuple
- Dictionary
- Numpy
- Date_Range
- Series Indexing

## Data Analysis with Pandas


|S.No |Name |Gender|
|--|--|--|
|1 | Mercy | Female|
|2 | Cherry | Male |
|3 | Raju | Male |


* Pandas DataFrame
* Combining & Merging
* File I/O
* Indexing


In [1]:
pip install pandas




In [3]:
import pandas  as pd

In [4]:
pd.__version__

'1.0.5'

## creating series

In [5]:
# convert list into pandas series
li = [213,345,456,6,776,4564,534345]
s1 = pd.Series(li)
s1
# each value in series having index 
# index starts from 0 to n

0       213
1       345
2       456
3         6
4       776
5      4564
6    534345
dtype: int64

In [6]:
# convertig tuple into series
t = (123,34,345.45,"APSSDC")
s2 = pd.Series(t)
s2

0       123
1        34
2    345.45
3    APSSDC
dtype: object

In [7]:
# converting Dict into Series
di = {"a":112,"b":345,"c":68}
s3 = pd.Series(di)
s3
# keys acts like index

a    112
b    345
c     68
dtype: int64

In [10]:
# changing index values
l = [3234,456,34.8]
s4 = pd.Series(l,index = ["x",456,45.4])
s4
# index values can be any type of data

x       3234.0
456      456.0
45.4      34.8
dtype: float64

In [11]:
# converting numpy into series
import numpy as np
n = np.array(l)
s5 = pd.Series(n)
s5

0    3234.0
1     456.0
2      34.8
dtype: float64

In [12]:
# Date_Range

s6 = pd.date_range(start = "2021-06-01",end = "2021-07-15")
s6

DatetimeIndex(['2021-06-01', '2021-06-02', '2021-06-03', '2021-06-04',
               '2021-06-05', '2021-06-06', '2021-06-07', '2021-06-08',
               '2021-06-09', '2021-06-10', '2021-06-11', '2021-06-12',
               '2021-06-13', '2021-06-14', '2021-06-15', '2021-06-16',
               '2021-06-17', '2021-06-18', '2021-06-19', '2021-06-20',
               '2021-06-21', '2021-06-22', '2021-06-23', '2021-06-24',
               '2021-06-25', '2021-06-26', '2021-06-27', '2021-06-28',
               '2021-06-29', '2021-06-30', '2021-07-01', '2021-07-02',
               '2021-07-03', '2021-07-04', '2021-07-05', '2021-07-06',
               '2021-07-07', '2021-07-08', '2021-07-09', '2021-07-10',
               '2021-07-11', '2021-07-12', '2021-07-13', '2021-07-14',
               '2021-07-15'],
              dtype='datetime64[ns]', freq='D')

In [14]:
help(pd.date_range)

Help on function date_range in module pandas.core.indexes.datetimes:

date_range(start=None, end=None, periods=None, freq=None, tz=None, normalize=False, name=None, closed=None, **kwargs) -> pandas.core.indexes.datetimes.DatetimeIndex
    Return a fixed frequency DatetimeIndex.
    
    Parameters
    ----------
    start : str or datetime-like, optional
        Left bound for generating dates.
    end : str or datetime-like, optional
        Right bound for generating dates.
    periods : int, optional
        Number of periods to generate.
    freq : str or DateOffset, default 'D'
        Frequency strings can have multiples, e.g. '5H'. See
        :ref:`here <timeseries.offset_aliases>` for a list of
        frequency aliases.
    tz : str or tzinfo, optional
        Time zone name for returning localized DatetimeIndex, for example
        'Asia/Hong_Kong'. By default, the resulting DatetimeIndex is
        timezone-naive.
    normalize : bool, default False
        Normalize start/

# Pandas Series Indexing

In [17]:
s1[0]

213

In [22]:
s1[::-1] # reverse of the series 

6    534345
5      4564
4       776
3         6
2       456
1       345
0       213
dtype: int64

In [24]:
s1[::2]

0       213
2       456
4       776
6    534345
dtype: int64

In [25]:
s2[1::2]

1        34
3    APSSDC
dtype: object

In [26]:
s1

0       213
1       345
2       456
3         6
4       776
5      4564
6    534345
dtype: int64

In [27]:
s1[3:]

3         6
4       776
5      4564
6    534345
dtype: int64

In [28]:
# access 3,6,2,4
# Fancy Indexing
s1[[3,6,2,4]]
# accessing Specified data 

3         6
6    534345
2       456
4       776
dtype: int64

In [29]:
s3

a    112
b    345
c     68
dtype: int64

In [30]:
s3["a"] # explicit slicing

112

In [31]:
s3["c"]

68

In [32]:
s3[0]  # implicit slicing

112

In [34]:
# converting Dict into Series
di = {"a":112,"b":345,"d":np.nan,"c":68}
s7 = pd.Series(di)
s7
# NaN - not a number - a special type of float value

a    112.0
b    345.0
d      NaN
c     68.0
dtype: float64

In [35]:
s8 = pd.Series(di,index = ["a","d","c"])
s8

a    112.0
d      NaN
c     68.0
dtype: float64

In [36]:
s9 = pd.Series("SRM",index = [290,392,435,234,324])
s9

290    SRM
392    SRM
435    SRM
234    SRM
324    SRM
dtype: object

# Task
- Generate n - Table  using pandas series

1 -- 5

2 -- 10

3 -- 15

In [37]:
n = int(input())
ls = [i for i in range(n, (n*10)+1, n)]
s = pd.Series(ls, index=[i for i in range(1, 11)])
print(s)

5
1      5
2     10
3     15
4     20
5     25
6     30
7     35
8     40
9     45
10    50
dtype: int64


In [38]:
di = {1:5,2:10,3:15}
s7 = pd.Series(di)
s7
# here boundaaries fixed and table number also fixed 

1     5
2    10
3    15
dtype: int64

In [39]:
list1=[x*5 for x in range(1,11)]
s1=pd.Series(list1,index=np.arange(1,11))
s1

1      5
2     10
3     15
4     20
5     25
6     30
7     35
8     40
9     45
10    50
dtype: int64

In [41]:
li = [5, 10, 15]
s = pd.Series(li, index=(1,2,3))
s

1     5
2    10
3    15
dtype: int64

In [45]:
pd.Series(np.arange(1,11)*5,index = np.arange(1,11))

1      5
2     10
3     15
4     20
5     25
6     30
7     35
8     40
9     45
10    50
dtype: int32

In [48]:
s5 ={1:5,2:10,3:15}
b7 = pd.Series(s5)
b7


1     5
2    10
3    15
dtype: int64

# Pandas DataFrame

In [46]:
# converting dict into Dataframe
di

{1: 5, 2: 10, 3: 15}

In [51]:
df1 = pd.DataFrame(di, index = ["a","b","c"])
df1
# keys acts like column names

Unnamed: 0,1,2,3
a,5,10,15
b,5,10,15
c,5,10,15


In [53]:
df1.columns = ["X","Y","Z"]
df1

Unnamed: 0,X,Y,Z
a,5,10,15
b,5,10,15
c,5,10,15


In [54]:
df1.shape # (rows,columns)

(3, 3)

In [56]:
# converting list into Df
df2 = pd.DataFrame([[1,2,3],[3,4,5],[6,7,8]])
df2
# columns and rows starts from 0

Unnamed: 0,0,1,2
0,1,2,3
1,3,4,5
2,6,7,8


In [57]:
df2.columns = ["er","56","df"]
df2

Unnamed: 0,er,56,df
0,1,2,3
1,3,4,5
2,6,7,8


In [58]:
df2.index = ["e","t","w"]
df2

Unnamed: 0,er,56,df
e,1,2,3
t,3,4,5
w,6,7,8


In [61]:
d2 = {
    "Name":["HemaSundar","Manish","Vamsi"],
    "Gender":["Male","Male",np.nan],
    "PIN" : [481,512,345]
}
df3 = pd.DataFrame(d2)
df3

Unnamed: 0,Name,Gender,PIN
0,HemaSundar,Male,481
1,Manish,Male,512
2,Vamsi,,345


In [63]:
type(df3["Name"])

pandas.core.series.Series

In [64]:
df3["Name"]

0    HemaSundar
1        Manish
2         Vamsi
Name: Name, dtype: object

In [65]:
df3["PIN"]

0    481
1    512
2    345
Name: PIN, dtype: int64

In [66]:
df3["Name","PIN"]

KeyError: ('Name', 'PIN')

In [67]:
df3[["Name","PIN"]] # accessing sub df

Unnamed: 0,Name,PIN
0,HemaSundar,481
1,Manish,512
2,Vamsi,345


In [69]:
df3[2:3]

Unnamed: 0,Name,Gender,PIN
2,Vamsi,,345


In [71]:
df3[::-1]

Unnamed: 0,Name,Gender,PIN
2,Vamsi,,345
1,Manish,Male,512
0,HemaSundar,Male,481


In [72]:
df3["Name"]

0    HemaSundar
1        Manish
2         Vamsi
Name: Name, dtype: object

In [74]:
df3[:1]

Unnamed: 0,Name,Gender,PIN
0,HemaSundar,Male,481


In [77]:
df3.set_index("Name")

Unnamed: 0_level_0,Gender,PIN
Name,Unnamed: 1_level_1,Unnamed: 2_level_1
HemaSundar,Male,481
Manish,Male,512
Vamsi,,345


In [78]:
df3

Unnamed: 0,Name,Gender,PIN
0,HemaSundar,Male,481
1,Manish,Male,512
2,Vamsi,,345


In [79]:
df3.set_index("Name", inplace = True) # changing original df

In [80]:
df3

Unnamed: 0_level_0,Gender,PIN
Name,Unnamed: 1_level_1,Unnamed: 2_level_1
HemaSundar,Male,481
Manish,Male,512
Vamsi,,345


## iloc -- for accessing rows using integer indicies
## loc -- for accessing rows other than integer indicies

In [83]:
df3.iloc[0]

Gender    Male
PIN        481
Name: HemaSundar, dtype: object

In [84]:
df3[0]

KeyError: 0

In [86]:
df3.iloc[2]

Gender    NaN
PIN       345
Name: Vamsi, dtype: object

In [89]:
df3.loc["HemaSundar"]

Gender    Male
PIN        481
Name: HemaSundar, dtype: object

In [92]:
df3.loc[["Manish","Vamsi"]]

Unnamed: 0_level_0,Gender,PIN
Name,Unnamed: 1_level_1,Unnamed: 2_level_1
Manish,Male,512
Vamsi,,345


In [94]:
df3.loc[["Manish","Vamsi"], "PIN"]

Name
Manish    512
Vamsi     345
Name: PIN, dtype: int64

In [97]:
df3.reset_index(inplace = True)

In [98]:
df3

Unnamed: 0,Name,Gender,PIN
0,HemaSundar,Male,481
1,Manish,Male,512
2,Vamsi,,345
