## Pandas â€“ DataFrame and Series

Pandas is a powerful data manipulation library in Python, widely used for data analysis and data cleaning.  
It provides two primary data structures: **Series** and **DataFrame**.

### Series
A **Series** is a one-dimensional, array-like object that can hold data of any type and has labeled indices.

### DataFrame
A **DataFrame** is a two-dimensional, size-mutable, and potentially heterogeneous tabular data structure with labeled axes (rows and columns).  
It is similar to a table in a database or an Excel spreadsheet.

In [1]:
!pip install pandas



In [2]:
import pandas as pd 


In [3]:
#creating series from array 
data=[1,2,3,4,5]
s1=pd.Series(data)
print(s1)

0    1
1    2
2    3
3    4
4    5
dtype: int64


In [4]:
# creating series from dictionary
data={'a':1,'b':2,'c':3}
s2=pd.Series(data)
print(s2)

a    1
b    2
c    3
dtype: int64


In [5]:
data=[1,2,3,4,5]
idx=['a','b','c','d','e']
s3=pd.Series(data,index=idx)
print(s3)

a    1
b    2
c    3
d    4
e    5
dtype: int64


In [6]:
# creating dataframes with dictionary
data={
    'name':['vijay','dinkar','yash'],
    'age':[20,21,22],
    'place':['bihar','bangalore','uttarpradesh']
}
df=pd.DataFrame(data)
print(df)

     name  age         place
0   vijay   20         bihar
1  dinkar   21     bangalore
2    yash   22  uttarpradesh


In [7]:
import numpy as np 
np.array(df)

array([['vijay', 20, 'bihar'],
       ['dinkar', 21, 'bangalore'],
       ['yash', 22, 'uttarpradesh']], dtype=object)

In [8]:
df['salary']=[50000,60000,70000]

In [9]:
df

Unnamed: 0,name,age,place,salary
0,vijay,20,bihar,50000
1,dinkar,21,bangalore,60000
2,yash,22,uttarpradesh,70000


In [13]:
df.drop('place',axis=1,inplace=True)

In [14]:
df

Unnamed: 0,name,age,salary
0,vijay,20,50000
1,dinkar,21,60000
2,yash,22,70000


In [16]:
df.loc[0][0]

  df.loc[0][0]


'vijay'

In [18]:
df.iloc[0][0]

  df.iloc[0][0]


'vijay'

In [22]:
df.describe()

Unnamed: 0,age,salary
count,3.0,3.0
mean,21.0,60000.0
std,1.0,10000.0
min,20.0,50000.0
25%,20.5,55000.0
50%,21.0,60000.0
75%,21.5,65000.0
max,22.0,70000.0


In [23]:
df1=pd.read_csv('data.csv')

In [24]:
df1

Unnamed: 0,Duration,Date,Pulse,Maxpulse,Calories
0,60,'2020/12/01',110,130,409.1
1,60,'2020/12/02',117,145,479.0
2,60,'2020/12/03',103,135,340.0
3,45,'2020/12/04',109,175,282.4
4,45,'2020/12/05',117,148,406.0
5,60,'2020/12/06',102,127,300.0
6,60,'2020/12/07',110,136,374.0
7,450,'2020/12/08',104,134,253.3
8,30,'2020/12/09',109,133,195.1
9,60,'2020/12/10',98,124,269.0


In [29]:
df1.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 32 entries, 0 to 31
Data columns (total 5 columns):
 #   Column    Non-Null Count  Dtype  
---  ------    --------------  -----  
 0   Duration  32 non-null     int64  
 1   Date      31 non-null     object 
 2   Pulse     32 non-null     int64  
 3   Maxpulse  32 non-null     int64  
 4   Calories  30 non-null     float64
dtypes: float64(1), int64(3), object(1)
memory usage: 1.4+ KB


In [26]:
df1.describe()

Unnamed: 0,Duration,Pulse,Maxpulse,Calories
count,32.0,32.0,32.0,30.0
mean,68.4375,103.5,128.5,304.68
std,70.039591,7.832933,12.998759,66.003779
min,30.0,90.0,101.0,195.1
25%,60.0,100.0,120.0,250.7
50%,60.0,102.5,127.5,291.2
75%,60.0,106.5,132.25,343.975
max,450.0,130.0,175.0,479.0


In [30]:
df1['Duration']=df1['Duration'].astype(float)

In [31]:
df1

Unnamed: 0,Duration,Date,Pulse,Maxpulse,Calories
0,60.0,'2020/12/01',110,130,409.1
1,60.0,'2020/12/02',117,145,479.0
2,60.0,'2020/12/03',103,135,340.0
3,45.0,'2020/12/04',109,175,282.4
4,45.0,'2020/12/05',117,148,406.0
5,60.0,'2020/12/06',102,127,300.0
6,60.0,'2020/12/07',110,136,374.0
7,450.0,'2020/12/08',104,134,253.3
8,30.0,'2020/12/09',109,133,195.1
9,60.0,'2020/12/10',98,124,269.0


In [32]:
df1['Maxpulse']=df1['Maxpulse']+50

In [33]:
df1

Unnamed: 0,Duration,Date,Pulse,Maxpulse,Calories
0,60.0,'2020/12/01',110,180,409.1
1,60.0,'2020/12/02',117,195,479.0
2,60.0,'2020/12/03',103,185,340.0
3,45.0,'2020/12/04',109,225,282.4
4,45.0,'2020/12/05',117,198,406.0
5,60.0,'2020/12/06',102,177,300.0
6,60.0,'2020/12/07',110,186,374.0
7,450.0,'2020/12/08',104,184,253.3
8,30.0,'2020/12/09',109,183,195.1
9,60.0,'2020/12/10',98,174,269.0


In [35]:
df1.isnull().sum()

Duration    0
Date        1
Pulse       0
Maxpulse    0
Calories    2
dtype: int64

In [36]:
df1.fillna(0)

Unnamed: 0,Duration,Date,Pulse,Maxpulse,Calories
0,60.0,'2020/12/01',110,180,409.1
1,60.0,'2020/12/02',117,195,479.0
2,60.0,'2020/12/03',103,185,340.0
3,45.0,'2020/12/04',109,225,282.4
4,45.0,'2020/12/05',117,198,406.0
5,60.0,'2020/12/06',102,177,300.0
6,60.0,'2020/12/07',110,186,374.0
7,450.0,'2020/12/08',104,184,253.3
8,30.0,'2020/12/09',109,183,195.1
9,60.0,'2020/12/10',98,174,269.0


In [37]:
df1['Calories']=df1['Calories'].fillna(df1['Calories'].mean())

In [38]:
df1

Unnamed: 0,Duration,Date,Pulse,Maxpulse,Calories
0,60.0,'2020/12/01',110,180,409.1
1,60.0,'2020/12/02',117,195,479.0
2,60.0,'2020/12/03',103,185,340.0
3,45.0,'2020/12/04',109,225,282.4
4,45.0,'2020/12/05',117,198,406.0
5,60.0,'2020/12/06',102,177,300.0
6,60.0,'2020/12/07',110,186,374.0
7,450.0,'2020/12/08',104,184,253.3
8,30.0,'2020/12/09',109,183,195.1
9,60.0,'2020/12/10',98,174,269.0


In [39]:
df1['Maxpulse']

0     180
1     195
2     185
3     225
4     198
5     177
6     186
7     184
8     183
9     174
10    197
11    170
12    170
13    178
14    182
15    173
16    170
17    170
18    162
19    173
20    175
21    181
22    169
23    151
24    182
25    176
26    170
27    168
28    182
29    182
30    179
31    165
Name: Maxpulse, dtype: int64

In [41]:
df1.head(10)

Unnamed: 0,Duration,Date,Pulse,Maxpulse,Calories
0,60.0,'2020/12/01',110,180,409.1
1,60.0,'2020/12/02',117,195,479.0
2,60.0,'2020/12/03',103,185,340.0
3,45.0,'2020/12/04',109,225,282.4
4,45.0,'2020/12/05',117,198,406.0
5,60.0,'2020/12/06',102,177,300.0
6,60.0,'2020/12/07',110,186,374.0
7,450.0,'2020/12/08',104,184,253.3
8,30.0,'2020/12/09',109,183,195.1
9,60.0,'2020/12/10',98,174,269.0


In [42]:
df1.tail()

Unnamed: 0,Duration,Date,Pulse,Maxpulse,Calories
27,60.0,'2020/12/27',92,168,241.0
28,60.0,'2020/12/28',103,182,304.68
29,60.0,'2020/12/29',100,182,280.0
30,60.0,'2020/12/30',102,179,380.3
31,60.0,'2020/12/31',92,165,243.0


In [43]:
df1.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 32 entries, 0 to 31
Data columns (total 5 columns):
 #   Column    Non-Null Count  Dtype  
---  ------    --------------  -----  
 0   Duration  32 non-null     float64
 1   Date      31 non-null     object 
 2   Pulse     32 non-null     int64  
 3   Maxpulse  32 non-null     int64  
 4   Calories  32 non-null     float64
dtypes: float64(2), int64(2), object(1)
memory usage: 1.4+ KB
