### Pandas in Python

**Pandas used for data manipulation, analysing and cleaning**

<img src="1_fUO28EIHi1bkZPhjZ451tQ.jpeg">

### Dataframes

**Dataframe is two-dimmensional, mutable, potentially heterogeneous tabular data**

### Series

**Series is one dimmensional, and can hold any types of datas like int, float, string, python objects etc**

In [1]:
# Importing Pandas

import pandas as pd
import numpy as np

In [2]:
dict1 = {
    "name":["Diptam", "Alex", "Albert", "Max"],
    "marks":[90,25,65,87],
    "country":["India", "USA", "UAE", "UK"]
}

In [3]:
df = pd.DataFrame(dict1)

In [4]:
df

Unnamed: 0,name,marks,country
0,Diptam,90,India
1,Alex,25,USA
2,Albert,65,UAE
3,Max,87,UK


In [5]:
s = pd.Series([1,2,3,4,5,6,7,np.nan,9,10])

In [6]:
s

0     1.0
1     2.0
2     3.0
3     4.0
4     5.0
5     6.0
6     7.0
7     NaN
8     9.0
9    10.0
dtype: float64

In [7]:
df.to_csv("marksheet.csv")

In [8]:
df

Unnamed: 0,name,marks,country
0,Diptam,90,India
1,Alex,25,USA
2,Albert,65,UAE
3,Max,87,UK


In [9]:
df.to_csv("marksheet_index_false.csv", index=False)

In [10]:
train = pd.read_csv("train.csv")

In [11]:
train

Unnamed: 0,Train No,Speed,country
0,12345,90,India
1,45654,25,USA
2,87453,65,UAE
3,45987,87,UK


### head & tail & describe

In [12]:
df

Unnamed: 0,name,marks,country
0,Diptam,90,India
1,Alex,25,USA
2,Albert,65,UAE
3,Max,87,UK


In [13]:
df.head(2)

Unnamed: 0,name,marks,country
0,Diptam,90,India
1,Alex,25,USA


In [14]:
df.tail(3)

Unnamed: 0,name,marks,country
1,Alex,25,USA
2,Albert,65,UAE
3,Max,87,UK


In [15]:
df.describe()

Unnamed: 0,marks
count,4.0
mean,66.75
std,29.981939
min,25.0
25%,55.0
50%,76.0
75%,87.75
max,90.0


### Indexing

In [16]:
train

Unnamed: 0,Train No,Speed,country
0,12345,90,India
1,45654,25,USA
2,87453,65,UAE
3,45987,87,UK


In [17]:
train['Speed'][3] = 100

A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  """Entry point for launching an IPython kernel.


In [18]:
train

Unnamed: 0,Train No,Speed,country
0,12345,90,India
1,45654,25,USA
2,87453,65,UAE
3,45987,100,UK


In [19]:
train.index = [1,2,3,4]

In [20]:
train

Unnamed: 0,Train No,Speed,country
1,12345,90,India
2,45654,25,USA
3,87453,65,UAE
4,45987,100,UK


In [21]:
train.index = ['1st','2nd','3rd','4th']

In [22]:
train

Unnamed: 0,Train No,Speed,country
1st,12345,90,India
2nd,45654,25,USA
3rd,87453,65,UAE
4th,45987,100,UK


### Creating Dataframe table using Numpy

In [23]:
ndf = pd.DataFrame(np.random.rand(20))

In [24]:
ndf

Unnamed: 0,0
0,0.431254
1,0.191688
2,0.433809
3,0.45019
4,0.483599
5,0.012685
6,0.695021
7,0.819955
8,0.197069
9,0.458418


In [25]:
ndf2 = pd.DataFrame(np.random.rand(450,5), index= np.arange(450))

In [26]:
ndf2

Unnamed: 0,0,1,2,3,4
0,0.655629,0.432670,0.893972,0.992994,0.322834
1,0.043093,0.242842,0.990467,0.052881,0.997345
2,0.228683,0.821769,0.942654,0.174405,0.726927
3,0.133659,0.112492,0.503674,0.798843,0.663652
4,0.337167,0.500788,0.086751,0.442609,0.114197
...,...,...,...,...,...
445,0.716484,0.017851,0.132051,0.885571,0.871679
446,0.954789,0.279534,0.862009,0.700447,0.587782
447,0.237142,0.936845,0.906496,0.139861,0.153587
448,0.286106,0.559589,0.160400,0.977675,0.478592


In [27]:
ndf2.head(5)

Unnamed: 0,0,1,2,3,4
0,0.655629,0.43267,0.893972,0.992994,0.322834
1,0.043093,0.242842,0.990467,0.052881,0.997345
2,0.228683,0.821769,0.942654,0.174405,0.726927
3,0.133659,0.112492,0.503674,0.798843,0.663652
4,0.337167,0.500788,0.086751,0.442609,0.114197


### Data types

In [28]:
ndf2.dtypes

0    float64
1    float64
2    float64
3    float64
4    float64
dtype: object

In [29]:
ndf2[0][0] = "hello"

In [30]:
ndf2.dtypes

0     object
1    float64
2    float64
3    float64
4    float64
dtype: object

In [31]:
ndf2.head(5)

Unnamed: 0,0,1,2,3,4
0,hello,0.43267,0.893972,0.992994,0.322834
1,0.0430929,0.242842,0.990467,0.052881,0.997345
2,0.228683,0.821769,0.942654,0.174405,0.726927
3,0.133659,0.112492,0.503674,0.798843,0.663652
4,0.337167,0.500788,0.086751,0.442609,0.114197


In [32]:
type(ndf2)

pandas.core.frame.DataFrame

In [33]:
type(s)

pandas.core.series.Series

### index vs Column

In [34]:
ndf2

Unnamed: 0,0,1,2,3,4
0,hello,0.432670,0.893972,0.992994,0.322834
1,0.0430929,0.242842,0.990467,0.052881,0.997345
2,0.228683,0.821769,0.942654,0.174405,0.726927
3,0.133659,0.112492,0.503674,0.798843,0.663652
4,0.337167,0.500788,0.086751,0.442609,0.114197
...,...,...,...,...,...
445,0.716484,0.017851,0.132051,0.885571,0.871679
446,0.954789,0.279534,0.862009,0.700447,0.587782
447,0.237142,0.936845,0.906496,0.139861,0.153587
448,0.286106,0.559589,0.160400,0.977675,0.478592


In [35]:
ndf2.index

Int64Index([  0,   1,   2,   3,   4,   5,   6,   7,   8,   9,
            ...
            440, 441, 442, 443, 444, 445, 446, 447, 448, 449],
           dtype='int64', length=450)

In [36]:
ndf2.columns

RangeIndex(start=0, stop=5, step=1)

### dataframe to numpy

In [37]:
ndf2[0][0] = 0.4567

A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  """Entry point for launching an IPython kernel.


In [38]:
ndf2

Unnamed: 0,0,1,2,3,4
0,0.4567,0.432670,0.893972,0.992994,0.322834
1,0.0430929,0.242842,0.990467,0.052881,0.997345
2,0.228683,0.821769,0.942654,0.174405,0.726927
3,0.133659,0.112492,0.503674,0.798843,0.663652
4,0.337167,0.500788,0.086751,0.442609,0.114197
...,...,...,...,...,...
445,0.716484,0.017851,0.132051,0.885571,0.871679
446,0.954789,0.279534,0.862009,0.700447,0.587782
447,0.237142,0.936845,0.906496,0.139861,0.153587
448,0.286106,0.559589,0.160400,0.977675,0.478592


In [39]:
ndf2.to_numpy()

array([[0.4567, 0.4326697850063509, 0.8939722197100213,
        0.9929944262846289, 0.322834036032684],
       [0.043092903797587234, 0.24284159477251033, 0.9904665812350866,
        0.05288107901682504, 0.9973450249024276],
       [0.2286830946175169, 0.8217685079688054, 0.9426542235452163,
        0.17440450761055548, 0.7269271696362685],
       ...,
       [0.23714173103214287, 0.9368447806441272, 0.9064957067290809,
        0.13986126107898222, 0.1535869639098829],
       [0.28610640100351925, 0.5595890659342295, 0.16039989947406053,
        0.9776753623544606, 0.4785916370667792],
       [0.4024468172893628, 0.7266846196971197, 0.14338571518633636,
        0.6773551296969628, 0.4892399918843886]], dtype=object)

### Transpose of Dataframe

In [40]:
ndf2.T

Unnamed: 0,0,1,2,3,4,5,6,7,8,9,...,440,441,442,443,444,445,446,447,448,449
0,0.4567,0.0430929,0.228683,0.133659,0.337167,0.425378,0.629703,0.227546,0.703467,0.844613,...,0.0857148,0.466062,0.347175,0.892276,0.311717,0.716484,0.954789,0.237142,0.286106,0.402447
1,0.43267,0.242842,0.821769,0.112492,0.500788,0.17874,0.874137,0.0589259,0.382865,0.492666,...,0.284131,0.970293,0.666334,0.208029,0.535283,0.0178508,0.279534,0.936845,0.559589,0.726685
2,0.893972,0.990467,0.942654,0.503674,0.0867513,0.209132,0.186359,0.555505,0.290828,0.608974,...,0.662385,0.477843,0.253552,0.895205,0.956057,0.132051,0.862009,0.906496,0.1604,0.143386
3,0.992994,0.0528811,0.174405,0.798843,0.442609,0.652231,0.0506797,0.507651,0.165821,0.0746173,...,0.923448,0.970139,0.956177,0.268298,0.848688,0.885571,0.700447,0.139861,0.977675,0.677355
4,0.322834,0.997345,0.726927,0.663652,0.114197,0.272562,0.168261,0.767054,0.489353,0.0868084,...,0.620352,0.960026,0.347684,0.880384,0.799935,0.871679,0.587782,0.153587,0.478592,0.48924


### Sorting

In [41]:
ndf2

Unnamed: 0,0,1,2,3,4
0,0.4567,0.432670,0.893972,0.992994,0.322834
1,0.0430929,0.242842,0.990467,0.052881,0.997345
2,0.228683,0.821769,0.942654,0.174405,0.726927
3,0.133659,0.112492,0.503674,0.798843,0.663652
4,0.337167,0.500788,0.086751,0.442609,0.114197
...,...,...,...,...,...
445,0.716484,0.017851,0.132051,0.885571,0.871679
446,0.954789,0.279534,0.862009,0.700447,0.587782
447,0.237142,0.936845,0.906496,0.139861,0.153587
448,0.286106,0.559589,0.160400,0.977675,0.478592


In [42]:
ndf2.sort_index(axis=1, ascending=False)

Unnamed: 0,4,3,2,1,0
0,0.322834,0.992994,0.893972,0.432670,0.4567
1,0.997345,0.052881,0.990467,0.242842,0.0430929
2,0.726927,0.174405,0.942654,0.821769,0.228683
3,0.663652,0.798843,0.503674,0.112492,0.133659
4,0.114197,0.442609,0.086751,0.500788,0.337167
...,...,...,...,...,...
445,0.871679,0.885571,0.132051,0.017851,0.716484
446,0.587782,0.700447,0.862009,0.279534,0.954789
447,0.153587,0.139861,0.906496,0.936845,0.237142
448,0.478592,0.977675,0.160400,0.559589,0.286106


### Is dataframe a combination of series?

In [43]:
type(ndf2[0])

pandas.core.series.Series

### LOC

In [44]:
ndf2

Unnamed: 0,0,1,2,3,4
0,0.4567,0.432670,0.893972,0.992994,0.322834
1,0.0430929,0.242842,0.990467,0.052881,0.997345
2,0.228683,0.821769,0.942654,0.174405,0.726927
3,0.133659,0.112492,0.503674,0.798843,0.663652
4,0.337167,0.500788,0.086751,0.442609,0.114197
...,...,...,...,...,...
445,0.716484,0.017851,0.132051,0.885571,0.871679
446,0.954789,0.279534,0.862009,0.700447,0.587782
447,0.237142,0.936845,0.906496,0.139861,0.153587
448,0.286106,0.559589,0.160400,0.977675,0.478592


In [45]:
ndf2_new = ndf2.copy

In [46]:
#ndf2_new[0][0] = 10

In [47]:
ndf2.loc[0,0] = 0.654

In [48]:
ndf2

Unnamed: 0,0,1,2,3,4
0,0.654,0.432670,0.893972,0.992994,0.322834
1,0.0430929,0.242842,0.990467,0.052881,0.997345
2,0.228683,0.821769,0.942654,0.174405,0.726927
3,0.133659,0.112492,0.503674,0.798843,0.663652
4,0.337167,0.500788,0.086751,0.442609,0.114197
...,...,...,...,...,...
445,0.716484,0.017851,0.132051,0.885571,0.871679
446,0.954789,0.279534,0.862009,0.700447,0.587782
447,0.237142,0.936845,0.906496,0.139861,0.153587
448,0.286106,0.559589,0.160400,0.977675,0.478592


### what if you mistake the column name?

In [49]:
ndf2.columns = ['A','B','C','D','E']

#ndf2.columns = ['ABCDE']

In [50]:
ndf2

Unnamed: 0,A,B,C,D,E
0,0.654,0.432670,0.893972,0.992994,0.322834
1,0.0430929,0.242842,0.990467,0.052881,0.997345
2,0.228683,0.821769,0.942654,0.174405,0.726927
3,0.133659,0.112492,0.503674,0.798843,0.663652
4,0.337167,0.500788,0.086751,0.442609,0.114197
...,...,...,...,...,...
445,0.716484,0.017851,0.132051,0.885571,0.871679
446,0.954789,0.279534,0.862009,0.700447,0.587782
447,0.237142,0.936845,0.906496,0.139861,0.153587
448,0.286106,0.559589,0.160400,0.977675,0.478592


In [51]:
ndf2.loc[0,0] = 0.555

### Dropping Rows or Columns

In [52]:
ndf2 = ndf2.drop(0, axis = 1)

In [53]:
ndf2.head()

Unnamed: 0,A,B,C,D,E
0,0.654,0.43267,0.893972,0.992994,0.322834
1,0.0430929,0.242842,0.990467,0.052881,0.997345
2,0.228683,0.821769,0.942654,0.174405,0.726927
3,0.133659,0.112492,0.503674,0.798843,0.663652
4,0.337167,0.500788,0.086751,0.442609,0.114197


### Print certain amount of rows or columns

In [54]:
ndf2.loc[[1,2],:]

Unnamed: 0,A,B,C,D,E
1,0.0430929,0.242842,0.990467,0.052881,0.997345
2,0.228683,0.821769,0.942654,0.174405,0.726927


### Print with Conditions

In [55]:
ndf2.loc[(ndf2['B']<0.3)]

Unnamed: 0,A,B,C,D,E
1,0.0430929,0.242842,0.990467,0.052881,0.997345
3,0.133659,0.112492,0.503674,0.798843,0.663652
5,0.425378,0.178740,0.209132,0.652231,0.272562
7,0.227546,0.058926,0.555505,0.507651,0.767054
17,0.0841599,0.098022,0.390816,0.721927,0.174085
...,...,...,...,...,...
435,0.616748,0.059408,0.517450,0.130032,0.933007
440,0.0857148,0.284131,0.662385,0.923448,0.620352
443,0.892276,0.208029,0.895205,0.268298,0.880384
445,0.716484,0.017851,0.132051,0.885571,0.871679


### Print with Multiple Conditions

In [56]:
ndf2.loc[(ndf2['B']<0.3) & (ndf2['C']>0.6)]

Unnamed: 0,A,B,C,D,E
1,0.0430929,0.242842,0.990467,0.052881,0.997345
18,0.336224,0.115146,0.698976,0.396880,0.075527
22,0.298126,0.113701,0.631652,0.514697,0.416942
35,0.320968,0.012688,0.919246,0.780192,0.908137
41,0.330453,0.155310,0.715820,0.976326,0.372235
...,...,...,...,...,...
421,0.246948,0.034596,0.690856,0.212959,0.495426
432,0.97706,0.165278,0.724615,0.368308,0.187790
440,0.0857148,0.284131,0.662385,0.923448,0.620352
443,0.892276,0.208029,0.895205,0.268298,0.880384


### iloc

In [57]:
ndf2

Unnamed: 0,A,B,C,D,E
0,0.654,0.432670,0.893972,0.992994,0.322834
1,0.0430929,0.242842,0.990467,0.052881,0.997345
2,0.228683,0.821769,0.942654,0.174405,0.726927
3,0.133659,0.112492,0.503674,0.798843,0.663652
4,0.337167,0.500788,0.086751,0.442609,0.114197
...,...,...,...,...,...
445,0.716484,0.017851,0.132051,0.885571,0.871679
446,0.954789,0.279534,0.862009,0.700447,0.587782
447,0.237142,0.936845,0.906496,0.139861,0.153587
448,0.286106,0.559589,0.160400,0.977675,0.478592


In [58]:
ndf2.iloc[0,4]

0.322834036032684

In [59]:
ndf2.iloc[[0,5,1],[0,2]]

Unnamed: 0,A,C
0,0.654,0.893972
5,0.425378,0.209132
1,0.0430929,0.990467


### Inplace

In [60]:
ndf2.drop(['E','B'], axis=1, inplace=True)

In [61]:
ndf2

Unnamed: 0,A,C,D
0,0.654,0.893972,0.992994
1,0.0430929,0.990467,0.052881
2,0.228683,0.942654,0.174405
3,0.133659,0.503674,0.798843
4,0.337167,0.086751,0.442609
...,...,...,...
445,0.716484,0.132051,0.885571
446,0.954789,0.862009,0.700447
447,0.237142,0.906496,0.139861
448,0.286106,0.160400,0.977675


### reset

In [62]:
ndf2.drop([1,3,5], axis=0, inplace=True)

In [63]:
ndf2

Unnamed: 0,A,C,D
0,0.654,0.893972,0.992994
2,0.228683,0.942654,0.174405
4,0.337167,0.086751,0.442609
6,0.629703,0.186359,0.050680
7,0.227546,0.555505,0.507651
...,...,...,...
445,0.716484,0.132051,0.885571
446,0.954789,0.862009,0.700447
447,0.237142,0.906496,0.139861
448,0.286106,0.160400,0.977675


In [64]:
ndf2.reset_index(drop = True, inplace=True)

In [65]:
ndf2

Unnamed: 0,A,C,D
0,0.654,0.893972,0.992994
1,0.228683,0.942654,0.174405
2,0.337167,0.086751,0.442609
3,0.629703,0.186359,0.050680
4,0.227546,0.555505,0.507651
...,...,...,...
442,0.716484,0.132051,0.885571
443,0.954789,0.862009,0.700447
444,0.237142,0.906496,0.139861
445,0.286106,0.160400,0.977675


### isnull()

In [66]:
ndf2.isnull()

Unnamed: 0,A,C,D
0,False,False,False
1,False,False,False
2,False,False,False
3,False,False,False
4,False,False,False
...,...,...,...
442,False,False,False
443,False,False,False
444,False,False,False
445,False,False,False


In [67]:
ndf2.loc[:,['C']] = None

In [68]:
ndf2.head()

Unnamed: 0,A,C,D
0,0.654,,0.992994
1,0.228683,,0.174405
2,0.337167,,0.442609
3,0.629703,,0.05068
4,0.227546,,0.507651


In [69]:
ndf2.isnull()

Unnamed: 0,A,C,D
0,False,True,False
1,False,True,False
2,False,True,False
3,False,True,False
4,False,True,False
...,...,...,...
442,False,True,False
443,False,True,False
444,False,True,False
445,False,True,False


### drop Null Values

In [70]:
df = pd.DataFrame({"name": ['Alfred', 'Batman', 'Batman'],
                   "toy": [np.nan, 'Batmobile', 'Bullwhip'],
                   "born": [pd.NaT, pd.Timestamp("1940-04-25"),
                            pd.NaT]})

In [71]:
df

Unnamed: 0,name,toy,born
0,Alfred,,NaT
1,Batman,Batmobile,1940-04-25
2,Batman,Bullwhip,NaT


In [72]:
df.dropna(how='all', axis= 1)

Unnamed: 0,name,toy,born
0,Alfred,,NaT
1,Batman,Batmobile,1940-04-25
2,Batman,Bullwhip,NaT


### drop duplicates

In [73]:
df

Unnamed: 0,name,toy,born
0,Alfred,,NaT
1,Batman,Batmobile,1940-04-25
2,Batman,Bullwhip,NaT


In [74]:
df.drop_duplicates(subset='name', keep = False)

Unnamed: 0,name,toy,born
0,Alfred,,NaT


### Miscellaneous

In [75]:
ndf2.shape

(447, 3)

In [76]:
df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 3 entries, 0 to 2
Data columns (total 3 columns):
 #   Column  Non-Null Count  Dtype         
---  ------  --------------  -----         
 0   name    3 non-null      object        
 1   toy     2 non-null      object        
 2   born    1 non-null      datetime64[ns]
dtypes: datetime64[ns](1), object(2)
memory usage: 200.0+ bytes


In [77]:
df['toy'].value_counts(dropna=False)

Batmobile    1
Bullwhip     1
NaN          1
Name: toy, dtype: int64

In [78]:
ndf2.isnull()

Unnamed: 0,A,C,D
0,False,True,False
1,False,True,False
2,False,True,False
3,False,True,False
4,False,True,False
...,...,...,...
442,False,True,False
443,False,True,False
444,False,True,False
445,False,True,False


In [79]:
ndf2.notnull()

Unnamed: 0,A,C,D
0,True,False,True
1,True,False,True
2,True,False,True
3,True,False,True
4,True,False,True
...,...,...,...
442,True,False,True
443,True,False,True
444,True,False,True
445,True,False,True


### for loops

In [80]:
ndf2

Unnamed: 0,A,C,D
0,0.654,,0.992994
1,0.228683,,0.174405
2,0.337167,,0.442609
3,0.629703,,0.050680
4,0.227546,,0.507651
...,...,...,...
442,0.716484,,0.885571
443,0.954789,,0.700447
444,0.237142,,0.139861
445,0.286106,,0.977675


In [81]:
for index,row in ndf2.iterrows():
    print(index,row['A'])

0 0.654
1 0.2286830946175169
2 0.3371669259446479
3 0.6297026782034774
4 0.22754555352339145
5 0.7034669400938335
6 0.8446128326446023
7 0.25784563929699833
8 0.16191708758990975
9 0.6636232585549814
10 0.5860080749122665
11 0.5599727595777187
12 0.508188934796957
13 0.7564327063637906
14 0.08415993896988982
15 0.33622394901392816
16 0.43040243649642385
17 0.615276392950946
18 0.04588587669444999
19 0.2981258998751024
20 0.8742493254823142
21 0.5759644660590809
22 0.6267835966482371
23 0.8409230280226144
24 0.31089530672563137
25 0.6591560666145146
26 0.8988065617269068
27 0.27553009745789003
28 0.36646641289605075
29 0.5726364486950599
30 0.21768892474906187
31 0.5211273241706522
32 0.32096765226081714
33 0.3627990110207687
34 0.8777034579726701
35 0.8998088851205609
36 0.9521722748121213
37 0.7182680560752527
38 0.33045272720398244
39 0.3736721975656122
40 0.8936719211298624
41 0.3258684118418107
42 0.13919378542939687
43 0.5324084622960401
44 0.1620895942074374
45 0.6005491806164451

### Adding a new Column

In [82]:
ndf2['E'] = ndf2['D'] + ndf2['A']

In [83]:
ndf2.head(5)

Unnamed: 0,A,C,D,E
0,0.654,,0.992994,1.64699
1,0.228683,,0.174405,0.403088
2,0.337167,,0.442609,0.779776
3,0.629703,,0.05068,0.680382
4,0.227546,,0.507651,0.735196


### string contains

In [84]:
df

Unnamed: 0,name,toy,born
0,Alfred,,NaT
1,Batman,Batmobile,1940-04-25
2,Batman,Bullwhip,NaT


In [85]:
df.loc[2,'name'] = 'Batwoman'

In [86]:
df.loc[df['name'].str.contains('Bat')]

Unnamed: 0,name,toy,born
1,Batman,Batmobile,1940-04-25
2,Batwoman,Bullwhip,NaT


In [87]:
import re

In [88]:
df.loc[df['name'].str.contains('bat|man', regex=True)]

Unnamed: 0,name,toy,born
1,Batman,Batmobile,1940-04-25
2,Batwoman,Bullwhip,NaT


### Conditional Changes

In [89]:
df

Unnamed: 0,name,toy,born
0,Alfred,,NaT
1,Batman,Batmobile,1940-04-25
2,Batwoman,Bullwhip,NaT


In [90]:
df.loc[df['name'] == 'Batman', 'name'] = 'Batwoman'

In [91]:
df

Unnamed: 0,name,toy,born
0,Alfred,,NaT
1,Batwoman,Batmobile,1940-04-25
2,Batwoman,Bullwhip,NaT


In [92]:
df.loc[df['name'] == 'Batwoman', 'toy'] = 'Batmobile'

In [93]:
df

Unnamed: 0,name,toy,born
0,Alfred,,NaT
1,Batwoman,Batmobile,1940-04-25
2,Batwoman,Batmobile,NaT


### Working with Large FIles

In [94]:
ndf2

Unnamed: 0,A,C,D,E
0,0.654,,0.992994,1.64699
1,0.228683,,0.174405,0.403088
2,0.337167,,0.442609,0.779776
3,0.629703,,0.050680,0.680382
4,0.227546,,0.507651,0.735196
...,...,...,...,...
442,0.716484,,0.885571,1.60206
443,0.954789,,0.700447,1.65524
444,0.237142,,0.139861,0.377003
445,0.286106,,0.977675,1.26378


In [96]:
ndf2.to_csv("Large_file.csv")

In [100]:
for df in pd.read_csv("Large_file.csv", chunksize=5):
    print("separated")
    print(df)

separated
   Unnamed: 0         A   C         D         E
0           0  0.654000 NaN  0.992994  1.646994
1           1  0.228683 NaN  0.174405  0.403088
2           2  0.337167 NaN  0.442609  0.779776
3           3  0.629703 NaN  0.050680  0.680382
4           4  0.227546 NaN  0.507651  0.735196
separated
   Unnamed: 0         A   C         D         E
5           5  0.703467 NaN  0.165821  0.869288
6           6  0.844613 NaN  0.074617  0.919230
7           7  0.257846 NaN  0.507080  0.764926
8           8  0.161917 NaN  0.517926  0.679843
9           9  0.663623 NaN  0.728576  1.392199
separated
    Unnamed: 0         A   C         D         E
10          10  0.586008 NaN  0.826688  1.412696
11          11  0.559973 NaN  0.960057  1.520030
12          12  0.508189 NaN  0.227246  0.735435
13          13  0.756433 NaN  0.017387  0.773820
14          14  0.084160 NaN  0.721927  0.806087
separated
    Unnamed: 0         A   C         D         E
15          15  0.336224 NaN  0.396880  0

separated
     Unnamed: 0         A   C         D         E
225         225  0.073054 NaN  0.095980  0.169035
226         226  0.385875 NaN  0.393952  0.779828
227         227  0.735811 NaN  0.652688  1.388499
228         228  0.988583 NaN  0.051844  1.040427
229         229  0.144384 NaN  0.761709  0.906093
separated
     Unnamed: 0         A   C         D         E
230         230  0.742084 NaN  0.994656  1.736740
231         231  0.832062 NaN  0.012377  0.844439
232         232  0.861240 NaN  0.116584  0.977824
233         233  0.937165 NaN  0.711869  1.649034
234         234  0.746862 NaN  0.550870  1.297732
separated
     Unnamed: 0         A   C         D         E
235         235  0.668896 NaN  0.964777  1.633674
236         236  0.319612 NaN  0.818247  1.137858
237         237  0.031397 NaN  0.374472  0.405869
238         238  0.670373 NaN  0.677722  1.348095
239         239  0.644209 NaN  0.400326  1.044535
separated
     Unnamed: 0         A   C         D         E
240       