# Introduction to pandas

---------------------------------

**What is pandas**

- *Pandas is an open source data analysis library written in python*

- *It leverages the power and speed of numpy to make data analysis and preprocessing easy for data scientists*

- *It provides rich and highly robust data operations*

**Pandas data structure**

*Pandas has two types of data structures:*
 
- *a)Series - It's a one dimensional array with indices, it stores a single column or row of data in a Dataframe*

- *b)DataFrame - It's a tabular spreadsheet like structure representing rows each of which contains one or multiple columns*

*A one-dimensional array(labeled) capable of holding any type of data - Series*

*A two-dimensional data(labeled) structure with columns of potentially different types of data - DataFrame*

-------------------------------------------------

In [203]:
import numpy as np
import pandas as pd

**DataFrame**

In [204]:
dict1 = {
    "name" : ['harry', 'rohan', 'skillf', 'shubh'],
    "marks" : [92,34,24,17],
    "city" :['rampur', 'kolkata','bareilly', 'antartica']
}

In [205]:
df = pd.DataFrame(dict1)

In [206]:
df

Unnamed: 0,name,marks,city
0,harry,92,rampur
1,rohan,34,kolkata
2,skillf,24,bareilly
3,shubh,17,antartica


converting into a csv but the indices get added in the first column

In [207]:
df.to_csv('friends.csv')

to remove the indices

In [208]:
df.to_csv('friends_index_false.csv', index = False)

to get the first n (here n = 2) rows

In [209]:
df.head(2)

Unnamed: 0,name,marks,city
0,harry,92,rampur
1,rohan,34,kolkata


to get the last n (here n = 2) rows

In [210]:
df.tail(2)

Unnamed: 0,name,marks,city
2,skillf,24,bareilly
3,shubh,17,antartica


describe() function gives some of the solved arithmetic operations of the columns which have numericals

In [211]:
df.describe()

Unnamed: 0,marks
count,4.0
mean,41.75
std,34.21866
min,17.0
25%,22.25
50%,29.0
75%,48.5
max,92.0


**Reading a CSV**

but the pandas converts it into a dataframe and reads it. so here pravith variable is a dataframe now

In [212]:
pravith = pd.read_csv('pravith.csv')

In [213]:
pravith

Unnamed: 0.5,Unnamed: 0.4,Unnamed: 0.3,Unnamed: 0.2,Unnamed: 0.1,Unnamed: 0,Train No,Speed,City
0,0,0,0,0,0,12322,50,rampur
1,1,1,1,1,1,12534,123,kolkata
2,2,2,2,2,2,125654,67,bareilly
3,3,3,3,3,3,564523,87,antartica


printing a column

In [214]:
pravith['Speed']

0     50
1    123
2     67
3     87
Name: Speed, dtype: int64

changing elements in the cells (ignore the warning for now)

In [215]:
pravith['Speed'][0] = 50

A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  pravith['Speed'][0] = 50


In [216]:
pravith

Unnamed: 0.5,Unnamed: 0.4,Unnamed: 0.3,Unnamed: 0.2,Unnamed: 0.1,Unnamed: 0,Train No,Speed,City
0,0,0,0,0,0,12322,50,rampur
1,1,1,1,1,1,12534,123,kolkata
2,2,2,2,2,2,125654,67,bareilly
3,3,3,3,3,3,564523,87,antartica


after changing the values, resaving the file

In [217]:
pravith.to_csv("pravith.csv")

renaming the indices

In [218]:
pravith.index = ['first', 'second', 'third', 'forth' ]

In [219]:
pravith

Unnamed: 0.5,Unnamed: 0.4,Unnamed: 0.3,Unnamed: 0.2,Unnamed: 0.1,Unnamed: 0,Train No,Speed,City
first,0,0,0,0,0,12322,50,rampur
second,1,1,1,1,1,12534,123,kolkata
third,2,2,2,2,2,125654,67,bareilly
forth,3,3,3,3,3,564523,87,antartica


**Series**

In [220]:
ser = pd.Series(np.random.rand(34))

In [221]:
type(ser)

pandas.core.series.Series

In [222]:
ser

0     0.052184
1     0.779164
2     0.704585
3     0.381021
4     0.068627
5     0.457801
6     0.100164
7     0.173520
8     0.691598
9     0.681965
10    0.988326
11    0.710068
12    0.212687
13    0.495911
14    0.072384
15    0.954232
16    0.349935
17    0.227867
18    0.154899
19    0.916824
20    0.004148
21    0.618509
22    0.729096
23    0.122566
24    0.399871
25    0.022402
26    0.923234
27    0.299492
28    0.838385
29    0.078616
30    0.612715
31    0.680572
32    0.836693
33    0.059686
dtype: float64

----

In [223]:
newdf = pd.DataFrame(np.random.rand(334,5), index=np.arange(334))

In [224]:
type(newdf)

pandas.core.frame.DataFrame

In [225]:
newdf

Unnamed: 0,0,1,2,3,4
0,0.761745,0.553429,0.280314,0.536155,0.956668
1,0.661149,0.364377,0.382680,0.286471,0.293963
2,0.732445,0.651605,0.221675,0.410623,0.775324
3,0.075717,0.585709,0.092021,0.532596,0.743838
4,0.969634,0.056530,0.615939,0.577099,0.953095
...,...,...,...,...,...
329,0.470670,0.270655,0.085958,0.105299,0.498442
330,0.592158,0.062938,0.127649,0.304817,0.579669
331,0.288743,0.843733,0.252554,0.564244,0.906881
332,0.993500,0.684352,0.196891,0.514855,0.696257


In [226]:
newdf.describe()

Unnamed: 0,0,1,2,3,4
count,334.0,334.0,334.0,334.0,334.0
mean,0.468831,0.511712,0.534556,0.49915,0.497113
std,0.296283,0.294659,0.298954,0.290195,0.273115
min,0.002883,0.000656,0.006393,0.004649,0.011093
25%,0.209168,0.264402,0.277644,0.257406,0.282021
50%,0.458742,0.516264,0.55218,0.509497,0.496575
75%,0.742227,0.770072,0.78975,0.728916,0.724204
max,0.996464,0.999404,0.997226,0.997193,0.985164


In [227]:
newdf.dtypes

0    float64
1    float64
2    float64
3    float64
4    float64
dtype: object

In [228]:
newdf[0][0] = "Pravith"
newdf.dtypes

0     object
1    float64
2    float64
3    float64
4    float64
dtype: object

In [229]:
newdf.head()

Unnamed: 0,0,1,2,3,4
0,Pravith,0.553429,0.280314,0.536155,0.956668
1,0.661149,0.364377,0.38268,0.286471,0.293963
2,0.732445,0.651605,0.221675,0.410623,0.775324
3,0.075717,0.585709,0.092021,0.532596,0.743838
4,0.969634,0.05653,0.615939,0.577099,0.953095


In [230]:
newdf.index

Int64Index([  0,   1,   2,   3,   4,   5,   6,   7,   8,   9,
            ...
            324, 325, 326, 327, 328, 329, 330, 331, 332, 333],
           dtype='int64', length=334)

In [231]:
newdf.columns

RangeIndex(start=0, stop=5, step=1)

*Converting into a numpy array*

In [232]:
newdf.to_numpy()

array([['Pravith', 0.5534286083970463, 0.2803141097517444,
        0.5361551258893663, 0.95666845062724],
       [0.6611488023042829, 0.36437658896159253, 0.382680226074934,
        0.28647134772968297, 0.29396267875353266],
       [0.7324447319567492, 0.6516051955069866, 0.22167490928549882,
        0.4106227419599363, 0.7753243220881472],
       ...,
       [0.28874300760222493, 0.8437334155862034, 0.25255386720462325,
        0.5642444530488934, 0.9068805257665721],
       [0.9934997653402378, 0.6843518628537746, 0.19689065648068027,
        0.5148547882464602, 0.6962570376356271],
       [0.0605329326715911, 0.7379991769569214, 0.24547407442325497,
        0.1927213340665368, 0.5639182808903227]], dtype=object)

In [233]:
newdf[0][0] = 0.3

In [234]:
newdf.head()

Unnamed: 0,0,1,2,3,4
0,0.3,0.553429,0.280314,0.536155,0.956668
1,0.661149,0.364377,0.38268,0.286471,0.293963
2,0.732445,0.651605,0.221675,0.410623,0.775324
3,0.075717,0.585709,0.092021,0.532596,0.743838
4,0.969634,0.05653,0.615939,0.577099,0.953095


Transpose

In [235]:
newdf.T

Unnamed: 0,0,1,2,3,4,5,6,7,8,9,...,324,325,326,327,328,329,330,331,332,333
0,0.3,0.661149,0.732445,0.075717,0.969634,0.795666,0.615064,0.366586,0.08855,0.541291,...,0.552476,0.296006,0.421719,0.692425,0.782307,0.47067,0.592158,0.288743,0.9935,0.060533
1,0.553429,0.364377,0.651605,0.585709,0.05653,0.15011,0.490363,0.841779,0.15761,0.882984,...,0.075623,0.598987,0.810767,0.702007,0.072432,0.270655,0.062938,0.843733,0.684352,0.737999
2,0.280314,0.38268,0.221675,0.092021,0.615939,0.300824,0.183623,0.869164,0.841358,0.40754,...,0.938198,0.236616,0.949927,0.826838,0.543886,0.085958,0.127649,0.252554,0.196891,0.245474
3,0.536155,0.286471,0.410623,0.532596,0.577099,0.025022,0.675744,0.196994,0.561683,0.358874,...,0.470075,0.368165,0.693878,0.126426,0.273725,0.105299,0.304817,0.564244,0.514855,0.192721
4,0.956668,0.293963,0.775324,0.743838,0.953095,0.07925,0.674815,0.966472,0.45598,0.25279,...,0.776581,0.333305,0.116724,0.865662,0.820446,0.498442,0.579669,0.906881,0.696257,0.563918


**Sorting a dataframe**

axis=0 means rows

axis=1 means columns

ascending=True by default

ascending=False to get in descending order

In [236]:
newdf.sort_index(axis=0, ascending=False)
#only the table gets reversed

Unnamed: 0,0,1,2,3,4
333,0.060533,0.737999,0.245474,0.192721,0.563918
332,0.9935,0.684352,0.196891,0.514855,0.696257
331,0.288743,0.843733,0.252554,0.564244,0.906881
330,0.592158,0.062938,0.127649,0.304817,0.579669
329,0.47067,0.270655,0.085958,0.105299,0.498442
...,...,...,...,...,...
4,0.969634,0.056530,0.615939,0.577099,0.953095
3,0.075717,0.585709,0.092021,0.532596,0.743838
2,0.732445,0.651605,0.221675,0.410623,0.775324
1,0.661149,0.364377,0.382680,0.286471,0.293963


In [237]:
newdf.head()

Unnamed: 0,0,1,2,3,4
0,0.3,0.553429,0.280314,0.536155,0.956668
1,0.661149,0.364377,0.38268,0.286471,0.293963
2,0.732445,0.651605,0.221675,0.410623,0.775324
3,0.075717,0.585709,0.092021,0.532596,0.743838
4,0.969634,0.05653,0.615939,0.577099,0.953095


DataFrame is a combination of series

In [238]:
type(newdf[0])

pandas.core.series.Series

*If you create a new dataframe and equate it with the existing dataframe, they don't get really copied but instead, both point to the same memory location. They both are just views (DBMS)*

In [239]:
newdf2 = newdf
newdf2[0][0]= 9783

In [240]:
newdf
# now we can observe that newdf also got changed

Unnamed: 0,0,1,2,3,4
0,9783,0.553429,0.280314,0.536155,0.956668
1,0.661149,0.364377,0.382680,0.286471,0.293963
2,0.732445,0.651605,0.221675,0.410623,0.775324
3,0.075717,0.585709,0.092021,0.532596,0.743838
4,0.969634,0.056530,0.615939,0.577099,0.953095
...,...,...,...,...,...
329,0.47067,0.270655,0.085958,0.105299,0.498442
330,0.592158,0.062938,0.127649,0.304817,0.579669
331,0.288743,0.843733,0.252554,0.564244,0.906881
332,0.9935,0.684352,0.196891,0.514855,0.696257


so, we need to use another syntax for copying

In [241]:
newdf2 = newdf.copy()
newdf2[0][0] = 34
newdf
# we can now see the original dataframe not getting changed

A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  newdf2[0][0] = 34


Unnamed: 0,0,1,2,3,4
0,9783,0.553429,0.280314,0.536155,0.956668
1,0.661149,0.364377,0.382680,0.286471,0.293963
2,0.732445,0.651605,0.221675,0.410623,0.775324
3,0.075717,0.585709,0.092021,0.532596,0.743838
4,0.969634,0.056530,0.615939,0.577099,0.953095
...,...,...,...,...,...
329,0.47067,0.270655,0.085958,0.105299,0.498442
330,0.592158,0.062938,0.127649,0.304817,0.579669
331,0.288743,0.843733,0.252554,0.564244,0.906881
332,0.9935,0.684352,0.196891,0.514855,0.696257


# loc

**Removing the warnings while updating the values**

In [242]:
newdf.loc[0,0] = 64
newdf.head(2)

Unnamed: 0,0,1,2,3,4
0,64.0,0.553429,0.280314,0.536155,0.956668
1,0.661149,0.364377,0.38268,0.286471,0.293963


Renaming the columns

In [243]:
newdf.columns = list("ABCDE")

In [244]:
newdf.head(2)

Unnamed: 0,A,B,C,D,E
0,64.0,0.553429,0.280314,0.536155,0.956668
1,0.661149,0.364377,0.38268,0.286471,0.293963


as the columns are changed now, we can't locate the cells by [0,0] etc

In [245]:
newdf.loc[0,0] = 456
# doesn't give error but new column gets created

In [246]:
newdf

Unnamed: 0,A,B,C,D,E,0
0,64,0.553429,0.280314,0.536155,0.956668,456.0
1,0.661149,0.364377,0.382680,0.286471,0.293963,
2,0.732445,0.651605,0.221675,0.410623,0.775324,
3,0.075717,0.585709,0.092021,0.532596,0.743838,
4,0.969634,0.056530,0.615939,0.577099,0.953095,
...,...,...,...,...,...,...
329,0.47067,0.270655,0.085958,0.105299,0.498442,
330,0.592158,0.062938,0.127649,0.304817,0.579669,
331,0.288743,0.843733,0.252554,0.564244,0.906881,
332,0.9935,0.684352,0.196891,0.514855,0.696257,


we need to mention the column name specifically

In [247]:
newdf.loc[0, 'A'] = 456
newdf

Unnamed: 0,A,B,C,D,E,0
0,456,0.553429,0.280314,0.536155,0.956668,456.0
1,0.661149,0.364377,0.382680,0.286471,0.293963,
2,0.732445,0.651605,0.221675,0.410623,0.775324,
3,0.075717,0.585709,0.092021,0.532596,0.743838,
4,0.969634,0.056530,0.615939,0.577099,0.953095,
...,...,...,...,...,...,...
329,0.47067,0.270655,0.085958,0.105299,0.498442,
330,0.592158,0.062938,0.127649,0.304817,0.579669,
331,0.288743,0.843733,0.252554,0.564244,0.906881,
332,0.9935,0.684352,0.196891,0.514855,0.696257,


dropping down a column

In [248]:
newdf = newdf.drop(0, axis=1)
# axis = 1 means columns

In [249]:
newdf.head()

Unnamed: 0,A,B,C,D,E
0,456.0,0.553429,0.280314,0.536155,0.956668
1,0.661149,0.364377,0.38268,0.286471,0.293963
2,0.732445,0.651605,0.221675,0.410623,0.775324
3,0.075717,0.585709,0.092021,0.532596,0.743838
4,0.969634,0.05653,0.615939,0.577099,0.953095


wanted to get only specific set of columns and rows

In [250]:
newdf.loc[[1,2], ['C', 'D']]
#they don't actually get changed

Unnamed: 0,C,D
1,0.38268,0.286471
2,0.221675,0.410623


In [251]:
newdf.head()

Unnamed: 0,A,B,C,D,E
0,456.0,0.553429,0.280314,0.536155,0.956668
1,0.661149,0.364377,0.38268,0.286471,0.293963
2,0.732445,0.651605,0.221675,0.410623,0.775324
3,0.075717,0.585709,0.092021,0.532596,0.743838
4,0.969634,0.05653,0.615939,0.577099,0.953095


want all the columns

In [252]:
newdf.loc[[1,2], :]

Unnamed: 0,A,B,C,D,E
1,0.661149,0.364377,0.38268,0.286471,0.293963
2,0.732445,0.651605,0.221675,0.410623,0.775324


want all the rows

In [253]:
newdf.loc[:, ['A', 'D']]

Unnamed: 0,A,D
0,456,0.536155
1,0.661149,0.286471
2,0.732445,0.410623
3,0.075717,0.532596
4,0.969634,0.577099
...,...,...
329,0.47067,0.105299
330,0.592158,0.304817
331,0.288743,0.564244
332,0.9935,0.514855


complex query

In [254]:
newdf.loc[(newdf['A'] < 0.3)&(newdf['C'] > 0.1)]

Unnamed: 0,A,B,C,D,E
8,0.08855,0.157610,0.841358,0.561683,0.455980
12,0.147137,0.503119,0.821709,0.678746,0.075432
13,0.25489,0.709960,0.638764,0.817316,0.802277
14,0.161833,0.539174,0.873054,0.576413,0.290897
18,0.26527,0.572273,0.212218,0.545846,0.644728
...,...,...,...,...,...
313,0.214902,0.535788,0.713115,0.720199,0.578950
314,0.072983,0.747936,0.920903,0.615450,0.207927
325,0.296006,0.598987,0.236616,0.368165,0.333305
331,0.288743,0.843733,0.252554,0.564244,0.906881


# iloc
used to locate the cells with numbers though the rows and columns names are not integers

In [255]:
newdf.iloc[0,4]

0.95666845062724

In [256]:
newdf.iloc[[0,5], [1,2]]

Unnamed: 0,B,C
0,0.553429,0.280314
5,0.15011,0.300824


# drop

by default

- axis = 0, that is, rows

- inplace = False, that is, original dataframe doesn't get changed

In [257]:
newdf.head(3)

Unnamed: 0,A,B,C,D,E
0,456.0,0.553429,0.280314,0.536155,0.956668
1,0.661149,0.364377,0.38268,0.286471,0.293963
2,0.732445,0.651605,0.221675,0.410623,0.775324


In [258]:
newdf.drop([0])

Unnamed: 0,A,B,C,D,E
1,0.661149,0.364377,0.382680,0.286471,0.293963
2,0.732445,0.651605,0.221675,0.410623,0.775324
3,0.075717,0.585709,0.092021,0.532596,0.743838
4,0.969634,0.056530,0.615939,0.577099,0.953095
5,0.795666,0.150110,0.300824,0.025022,0.079250
...,...,...,...,...,...
329,0.47067,0.270655,0.085958,0.105299,0.498442
330,0.592158,0.062938,0.127649,0.304817,0.579669
331,0.288743,0.843733,0.252554,0.564244,0.906881
332,0.9935,0.684352,0.196891,0.514855,0.696257


In [259]:
newdf.drop(['A', 'C'], axis=1, inplace=True)

In [260]:
newdf

Unnamed: 0,B,D,E
0,0.553429,0.536155,0.956668
1,0.364377,0.286471,0.293963
2,0.651605,0.410623,0.775324
3,0.585709,0.532596,0.743838
4,0.056530,0.577099,0.953095
...,...,...,...
329,0.270655,0.105299,0.498442
330,0.062938,0.304817,0.579669
331,0.843733,0.564244,0.906881
332,0.684352,0.514855,0.696257


In [261]:
newdf.drop([1,5], axis=0, inplace = True)
newdf

Unnamed: 0,B,D,E
0,0.553429,0.536155,0.956668
2,0.651605,0.410623,0.775324
3,0.585709,0.532596,0.743838
4,0.056530,0.577099,0.953095
6,0.490363,0.675744,0.674815
...,...,...,...
329,0.270655,0.105299,0.498442
330,0.062938,0.304817,0.579669
331,0.843733,0.564244,0.906881
332,0.684352,0.514855,0.696257


# reset_index

getting back the deleted rows or columns

In [262]:
newdf.reset_index(drop = True, inplace = True)
newdf

Unnamed: 0,B,D,E
0,0.553429,0.536155,0.956668
1,0.651605,0.410623,0.775324
2,0.585709,0.532596,0.743838
3,0.056530,0.577099,0.953095
4,0.490363,0.675744,0.674815
...,...,...,...
327,0.270655,0.105299,0.498442
328,0.062938,0.304817,0.579669
329,0.843733,0.564244,0.906881
330,0.684352,0.514855,0.696257


# isnull

returns a bool whether a cell is null or not

In [294]:
newdf.isnull()

Unnamed: 0,B,D,E
0,False,False,False
1,False,False,False
2,False,False,False
3,False,False,False
4,False,False,False
...,...,...,...
327,False,False,False
328,False,False,False
329,False,False,False
330,False,False,False


In [263]:
newdf['B'].isnull()

0      False
1      False
2      False
3      False
4      False
       ...  
327    False
328    False
329    False
330    False
331    False
Name: B, Length: 332, dtype: bool

In [264]:
newdf.loc[:,'B'] = None
newdf['B'].isnull()

0      True
1      True
2      True
3      True
4      True
       ... 
327    True
328    True
329    True
330    True
331    True
Name: B, Length: 332, dtype: bool

In [265]:
newdf.loc[:,'B'] = 34
newdf['B'].isnull()

0      False
1      False
2      False
3      False
4      False
       ...  
327    False
328    False
329    False
330    False
331    False
Name: B, Length: 332, dtype: bool

In [266]:
newdf

Unnamed: 0,B,D,E
0,34,0.536155,0.956668
1,34,0.410623,0.775324
2,34,0.532596,0.743838
3,34,0.577099,0.953095
4,34,0.675744,0.674815
...,...,...,...
327,34,0.105299,0.498442
328,34,0.304817,0.579669
329,34,0.564244,0.906881
330,34,0.514855,0.696257


# dropna
remove nulls

In [275]:
df = pd.DataFrame(
    {
        "name" : ['Ranjith','Pravith', 'Bobby'],
        'state' : ['Telangana','Assam',  np.nan], # pd.NA means None(not a number) # # np.nan means None(not a number)
        "born": [ pd.NaT,pd.Timestamp("2003-07-06"), pd.NaT] # pd.Timestamp stands for the time # pd.NaT means None(a missing date)
    }
)

In [276]:
df

Unnamed: 0,name,state,born
0,Ranjith,Telangana,NaT
1,Pravith,Assam,2003-07-06
2,Bobby,,NaT


In [277]:
df.dropna()

Unnamed: 0,name,state,born
1,Pravith,Assam,2003-07-06


**how** - *{'any', 'all'}, default = 'any'*
Determine if row or column is removed from DataFrame, when we have at least one NA or all NA.
- 'any' - if any NA values are present, drop that row or column 
- 'all' - if all values are NA, drop that row or column

In [278]:
df.dropna(how="all")

Unnamed: 0,name,state,born
0,Ranjith,Telangana,NaT
1,Pravith,Assam,2003-07-06
2,Bobby,,NaT


In [279]:
df.loc[1,'born'] = pd.NaT

In [280]:
df.dropna(how = "all", axis= 1)

Unnamed: 0,name,state
0,Ranjith,Telangana
1,Pravith,Assam
2,Bobby,


# drop_duplicates
remove the duplicates

**keep** : *{'first','last',False}, default : 'first'*
- first: Drop duplicates except for the first occurence
- last : Drop duplicated except for the last occurence
- False : Drop all duplicates

**subset** : *column label or sequence of labels, optional*
- Only consider certain columns for identifying duplicates, by default use all the columns

In [281]:
df.loc[2,"name"] = "Ranjith"
df

Unnamed: 0,name,state,born
0,Ranjith,Telangana,NaT
1,Pravith,Assam,NaT
2,Ranjith,,NaT


In [282]:
df.drop_duplicates(subset=['name'])

Unnamed: 0,name,state,born
0,Ranjith,Telangana,NaT
1,Pravith,Assam,NaT


In [284]:
df.drop_duplicates(subset=['name'], keep='last')

Unnamed: 0,name,state,born
1,Pravith,Assam,NaT
2,Ranjith,,NaT


In [285]:
df.drop_duplicates(subset=['name'],keep=False)

Unnamed: 0,name,state,born
1,Pravith,Assam,NaT


-----------

In [286]:
df.shape

(3, 3)

In [287]:
df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 3 entries, 0 to 2
Data columns (total 3 columns):
 #   Column  Non-Null Count  Dtype         
---  ------  --------------  -----         
 0   name    3 non-null      object        
 1   state   2 non-null      object        
 2   born    0 non-null      datetime64[ns]
dtypes: datetime64[ns](1), object(2)
memory usage: 200.0+ bytes


In [288]:
df['name'].value_counts(dropna=False)

Ranjith    2
Pravith    1
Name: name, dtype: int64

In [292]:
df['state'].value_counts(dropna=True)

Telangana    1
Assam        1
Name: state, dtype: int64

---------------

# Assignment

Create a dataframe which contains only integers with 3 rows and 2 columns.

Run following dataframe methods on them:

- df.describe()
- df.mean()
- df.corr()
- df.count()
- df.max()
- df.min()
- df.median()
- df.std()

------------------------

# Dealing with Excel Sheets

In [6]:
data = pd.read_excel('data.xlsx')

NameError: name 'pd' is not defined

In [2]:
data2 = pd.read_excel('data.xlsx', sheet_name = 'Sheet2')

NameError: name 'pd' is not defined

In [3]:
data2

NameError: name 'data' is not defined

In [4]:
data.iloc[0,0] = 34

NameError: name 'data' is not defined

In [5]:
data

NameError: name 'data' is not defined

In [None]:
data.to_excel('data.xlsx', sheet_name = 'Sheet2')