![Pandas_logo](pandas.jpg)

### <b>What Is Pandas In Python?</b>
 Pandas is an open source Python package that is most widely used for data science/data analysis and machine learning tasks. It is built on top of another package named Numpy, which provides support for multi-dimensional arrays. As one of the most popular data wrangling packages, Pandas works well with many other data science modules inside the Python ecosystem, and is typically included in every Python distribution, from those that come with your operating system to commercial vendor distributions like ActiveState’s ActivePython. 

### <b>What Can You Do With DataFrames Using Pandas?</b>

Pandas makes it simple to do many of the time consuming, repetitive tasks associated with working with data, including:

- Data cleansing
- Data fill
- Data normalization
- Merges and joins
- Data visualization
- Statistical analysis
- Data inspection
- Loading and saving data
- And much more
In fact, with Pandas, you can do everything that makes world-leading data scientists vote Pandas as the best data analysis and manipulation tool available.

In [1]:
#Import library of pandas and numpy
import pandas as pd
import numpy as np

### Pandas series creation

In [2]:
data = pd.Series([0.25, 0.5, 0.75, 1.0])
data

0    0.25
1    0.50
2    0.75
3    1.00
dtype: float64

In [3]:
# Checking the index of pd_series
data.index

RangeIndex(start=0, stop=4, step=1)

In [4]:
data[1:3]

1    0.50
2    0.75
dtype: float64

In [5]:
# Pandas series indexs can be defiened 
data = pd.Series([0.25, 0.5, 0.75, 1.0],index=['a', 'b', 'c', 'd'])
data

a    0.25
b    0.50
c    0.75
d    1.00
dtype: float64

In [6]:
# series elements can be call by their indexes
data['b']

0.5

In [7]:
# Series as specialized dictionary
population_dict = {'California': 3833252, 'Texas': 26448193,'New York': 19651127,'Florida': 19552860,'Illinois': 12882135}
population_dict

{'California': 3833252,
 'Texas': 26448193,
 'New York': 19651127,
 'Florida': 19552860,
 'Illinois': 12882135}

In [8]:
# Constructing Series objects
pd.Series([2, 4, 6])

0    2
1    4
2    6
dtype: int64

In [9]:
# Indexs can be defined aas we want
pd.Series(5, index=[100, 200, 300])


100    5
200    5
300    5
dtype: int64

In [10]:
# index defaults to the sorted dictionary keys:
pd.Series({2:'a', 1:'b', 3:'c'})

2    a
1    b
3    c
dtype: object

### Data Selection in Series


In [11]:
data = pd.Series([0.25, 0.5, 0.75, 1.0], index=['a', 'b', 'c', 'd'])
data

a    0.25
b    0.50
c    0.75
d    1.00
dtype: float64

In [12]:
data.keys()

Index(['a', 'b', 'c', 'd'], dtype='object')

In [13]:
data.items

<bound method Series.items of a    0.25
b    0.50
c    0.75
d    1.00
dtype: float64>

In [14]:
data[0]

0.25

#### A Series builds on this dictionary-like interface and provides array-style item selection via the same basic mechanisms as NumPy arrays—that is, slices, masking, and fancy indexing.

In [15]:
## slicing by explicit index
data['a':'c']

a    0.25
b    0.50
c    0.75
dtype: float64

In [16]:
# slicing by implicit integer index
data[0:2]

a    0.25
b    0.50
dtype: float64

In [17]:
# masking
data[(data > 0.3) & (data < 0.8)]

b    0.50
c    0.75
dtype: float64

#### Notice that when you are slicing with an explicit index (i.e., data['a':'d']), the final index is included in the slice, while when you’re slicing with an implicit index (i.e., data[0:2]), the final index is excluded from the slice.


In [18]:
# fancy indexing
data[['a', 'd']]

a    0.25
d    1.00
dtype: float64

#### DataFrame as two-dimensional array

In [19]:
area = pd.Series({'California': 423967, 'Texas': 695662,
 'New York': 141297, 'Florida': 170312,
 'Illinois': 149995})
pop = pd.Series({'California': 38332521, 'Texas': 26448193,
 'New York': 19651127, 'Florida': 19552860,
 'Illinois': 12882135})
data = pd.DataFrame({'area':area, 'pop':pop})
data


Unnamed: 0,area,pop
California,423967,38332521
Texas,695662,26448193
New York,141297,19651127
Florida,170312,19552860
Illinois,149995,12882135


In [20]:
# Data Transport
data.T

Unnamed: 0,California,Texas,New York,Florida,Illinois
area,423967,695662,141297,170312,149995
pop,38332521,26448193,19651127,19552860,12882135


In [21]:
# Selection via loc
data.loc[:'Florida', :'pop']

Unnamed: 0,area,pop
California,423967,38332521
Texas,695662,26448193
New York,141297,19651127
Florida,170312,19552860


In [22]:
# Selection via iloc
data.iloc[: , :1]

Unnamed: 0,area
California,423967
Texas,695662
New York,141297
Florida,170312
Illinois,149995


### Handling missing values in data

isnull() ::
Generate a Boolean mask indicating missing values

notnull() ::
Opposite of isnull()


dropna() ::
Return a filtered version of the data


fillna() ::
Return a copy of the data with missing values filled or imputed

In [23]:
#Detecting null values
data = pd.Series([1, np.nan, 'hello', None])
data.isnull()

0    False
1     True
2    False
3     True
dtype: bool

In [24]:
data[data.notnull()]


0        1
2    hello
dtype: object

In [25]:
# Dropping null values
# dropna() (which removes NA values) 
# fillna() (which fills in NA values)
data.dropna()

0        1
2    hello
dtype: object

In [26]:
# drop Na for Data frame
# For a DataFrame, there are more options. Consider the following DataFrame:
df = pd.DataFrame([[1, np.nan, 2],
[2, 3, 5],
[np.nan, 4, 6]])

In [27]:
df

Unnamed: 0,0,1,2
0,1.0,,2
1,2.0,3.0,5
2,,4.0,6


In [28]:
# We cannot drop single values from a DataFrame; we can only drop full rows or full columns.
# By default, dropna() will drop all rows in which any null value is present:
df.dropna()

Unnamed: 0,0,1,2
1,2.0,3.0,5


In [29]:
# Alternatively, you can drop NA values along a different axis; axis=1 drops all columns containing a null value:
df.dropna(axis=1)

Unnamed: 0,2
0,2
1,5
2,6


In [30]:
# You can also specify how='all', which will only drop rows/columns that are all null values
df.dropna(axis='columns' , how='all') 

Unnamed: 0,0,1,2
0,1.0,,2
1,2.0,3.0,5
2,,4.0,6


In [31]:
# For finer-grained control, the thresh parameter lets you specify a minimum number of non-null values for the row/column to be kept:
df.dropna(axis='rows' , thresh=3) 

Unnamed: 0,0,1,2
1,2.0,3.0,5


### Filling null values


In [32]:
# Sometimes rather than dropping NA values, you’d rather replace them with a valid value.
data = pd.Series([1, np.nan, 2, None, 3], index=list('abcde'))
data

a    1.0
b    NaN
c    2.0
d    NaN
e    3.0
dtype: float64

In [33]:
# We can fill NA entries with a single value, such as zero:
data.fillna(0)

a    1.0
b    0.0
c    2.0
d    0.0
e    3.0
dtype: float64

In [34]:
# forward-fill:: We can specify a forward-fill to propagate the previous value forward:
data.fillna(method='ffill')

a    1.0
b    1.0
c    2.0
d    2.0
e    3.0
dtype: float64

In [35]:
# back-fill :: Or we can specify a back-fill to propagate the next values backward
data.fillna(method='bfill')

a    1.0
b    2.0
c    2.0
d    3.0
e    3.0
dtype: float64

In [36]:
# Fill- with mean of data 
data.fillna(data.mean())

a    1.0
b    2.0
c    2.0
d    2.0
e    3.0
dtype: float64

### Combining Datasets: Concat and Append

In [37]:
# Simple Concatenation with pd.concat
ser1 = pd.Series(['A', 'B', 'C'], index=[1, 2, 3])
ser2 = pd.Series(['D', 'E', 'F'], index=[4, 5, 6])
pd.concat([ser1, ser2])

1    A
2    B
3    C
4    D
5    E
6    F
dtype: object

In [38]:
area = pd.Series({'California': 423967, 'Texas': 695662,
 'New York': 141297, 'Florida': 170312,
 'Illinois': 149995})
pop = pd.Series({'California': 38332521, 'Texas': 26448193,
 'New York': 19651127, 'Florida': 19552860,
 'Illinois': 12882135})
data = pd.DataFrame({'area':area, 'pop':pop})
data


Unnamed: 0,area,pop
California,423967,38332521
Texas,695662,26448193
New York,141297,19651127
Florida,170312,19552860
Illinois,149995,12882135


In [39]:
# It also works to concatenate higher-dimensional objects, such as DataFrames:
# One important difference between np.concatenate and pd.concat is that Pandas concatenation preserves indices, even if the result will have duplicate indices!
pd.concat([data, data], axis=1)

Unnamed: 0,area,pop,area.1,pop.1
California,423967,38332521,423967,38332521
Texas,695662,26448193,695662,26448193
New York,141297,19651127,141297,19651127
Florida,170312,19552860,170312,19552860
Illinois,149995,12882135,149995,12882135


In [40]:
# The append() method | the alternative of pd.concat

data.append([data])

  data.append([data])


Unnamed: 0,area,pop
California,423967,38332521
Texas,695662,26448193
New York,141297,19651127
Florida,170312,19552860
Illinois,149995,12882135
California,423967,38332521
Texas,695662,26448193
New York,141297,19651127
Florida,170312,19552860
Illinois,149995,12882135


#### Aggregation and Grouping

In [41]:
# we will use titanic dataset available in sns library
import seaborn as sns
df = sns.load_dataset('titanic')


In [42]:
# Categories of Joins
# The pd.merge() function implements a number of types of joins:
#  the one-to-one,
df1 = pd.DataFrame({'employee': ['Bob', 'Jake', 'Lisa', 'Sue'],
 'group': ['Accounting', 'Engineering', 'Engineering', 'HR']})
 
df2 = pd.DataFrame({'employee': ['Lisa', 'Bob', 'Jake', 'Sue'],
 'hire_date': [2004, 2008, 2012, 2014]})
print(df1); 
print('________________')
print(df2)

  employee        group
0      Bob   Accounting
1     Jake  Engineering
2     Lisa  Engineering
3      Sue           HR
________________
  employee  hire_date
0     Lisa       2004
1      Bob       2008
2     Jake       2012
3      Sue       2014


In [43]:
# merge one dataset into one
df3 = pd.merge(df1, df2)
df3

Unnamed: 0,employee,group,hire_date
0,Bob,Accounting,2008
1,Jake,Engineering,2012
2,Lisa,Engineering,2004
3,Sue,HR,2014


In [44]:
# Many-to-one joins
df4 = pd.DataFrame({'group': ['Accounting', 'Engineering', 'HR'],
 'supervisor': ['Carly', 'Guido', 'Steve']})
df5 = pd.merge(df3 ,df4)
df5

Unnamed: 0,employee,group,hire_date,supervisor
0,Bob,Accounting,2008,Carly
1,Jake,Engineering,2012,Guido
2,Lisa,Engineering,2004,Guido
3,Sue,HR,2014,Steve


In [45]:
# Many-to-many
df6 = pd.DataFrame({'group': ['Accounting', 'Accounting', 'Engineering', 'Engineering', 'HR', 'HR'],
'skills': ['math', 'spreadsheets', 'coding', 'linux', 'spreadsheets', 'organization']})
df7 = pd.merge(df5 , df6)
df7

Unnamed: 0,employee,group,hire_date,supervisor,skills
0,Bob,Accounting,2008,Carly,math
1,Bob,Accounting,2008,Carly,spreadsheets
2,Jake,Engineering,2012,Guido,coding
3,Jake,Engineering,2012,Guido,linux
4,Lisa,Engineering,2004,Guido,coding
5,Lisa,Engineering,2004,Guido,linux
6,Sue,HR,2014,Steve,spreadsheets
7,Sue,HR,2014,Steve,organization


### Aggregation and Grouping

#### An essential piece of analysis of large data is efficient summarization: computing aggregations like sum(), mean(), median(), min(), and max(), in which a single number gives insight into the nature of a potentially large dataset.

In [46]:
# For this example will use sns dataset of 'titanic'
import seaborn as sns
data = sns.load_dataset('titanic')
data.head()

Unnamed: 0,survived,pclass,sex,age,sibsp,parch,fare,embarked,class,who,adult_male,deck,embark_town,alive,alone
0,0,3,male,22.0,1,0,7.25,S,Third,man,True,,Southampton,no,False
1,1,1,female,38.0,1,0,71.2833,C,First,woman,False,C,Cherbourg,yes,False
2,1,3,female,26.0,0,0,7.925,S,Third,woman,False,,Southampton,yes,True
3,1,1,female,35.0,1,0,53.1,S,First,woman,False,C,Southampton,yes,False
4,0,3,male,35.0,0,0,8.05,S,Third,man,True,,Southampton,no,True


In [47]:
print('mean of age:', df['age'].mean())
print('median of age:',df['age'].median())
print('minimum age of colum:',df['age'].min())
print('maximum age of colum:',df['age'].max())

mean of age: 29.69911764705882
median of age: 28.0
minimum age of colum: 0.42
maximum age of colum: 80.0


In [48]:
# we can use simply on function to get all the info of data by using 'describe()' function.
data.describe()

Unnamed: 0,survived,pclass,age,sibsp,parch,fare
count,891.0,891.0,714.0,891.0,891.0,891.0
mean,0.383838,2.308642,29.699118,0.523008,0.381594,32.204208
std,0.486592,0.836071,14.526497,1.102743,0.806057,49.693429
min,0.0,1.0,0.42,0.0,0.0,0.0
25%,0.0,2.0,20.125,0.0,0.0,7.9104
50%,0.0,3.0,28.0,0.0,0.0,14.4542
75%,1.0,3.0,38.0,1.0,0.0,31.0
max,1.0,3.0,80.0,8.0,6.0,512.3292


### GroupBy: Split, Apply, Combine

#### Simple aggregations can give you a flavor of your dataset, but often we would prefer to aggregate conditionally on some label or index: this is implemented in the socalled groupby operation.

#### Split, apply, combine
##### A canonical example of this split-apply-combine operation, where the “apply” is a summation aggregation.

![Split, apply, combine](groupby_split_combile.PNG)

In [49]:
data.head()

Unnamed: 0,survived,pclass,sex,age,sibsp,parch,fare,embarked,class,who,adult_male,deck,embark_town,alive,alone
0,0,3,male,22.0,1,0,7.25,S,Third,man,True,,Southampton,no,False
1,1,1,female,38.0,1,0,71.2833,C,First,woman,False,C,Cherbourg,yes,False
2,1,3,female,26.0,0,0,7.925,S,Third,woman,False,,Southampton,yes,True
3,1,1,female,35.0,1,0,53.1,S,First,woman,False,C,Southampton,yes,False
4,0,3,male,35.0,0,0,8.05,S,Third,man,True,,Southampton,no,True


In [50]:
data.groupby('age').sum()

Unnamed: 0_level_0,survived,pclass,sibsp,parch,fare,adult_male,alone
age,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1
0.42,1,3,0,1,8.5167,0,0
0.67,1,2,1,1,14.5000,0,0
0.75,2,6,4,2,38.5166,0,0
0.83,2,4,1,3,47.7500,0,0
0.92,1,1,1,2,151.5500,0,0
...,...,...,...,...,...,...,...
70.00,0,3,1,1,81.5000,2,1
70.50,0,3,0,0,7.7500,1,1
71.00,0,2,0,0,84.1584,2,2
74.00,0,3,0,0,7.7750,1,1


In [51]:
data.groupby(['age' ,'class']).sum()

Unnamed: 0_level_0,Unnamed: 1_level_0,survived,pclass,sibsp,parch,fare,adult_male,alone
age,class,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1
0.42,First,0,0,0,0,0.0000,0,0
0.42,Second,0,0,0,0,0.0000,0,0
0.42,Third,1,3,0,1,8.5167,0,0
0.67,First,0,0,0,0,0.0000,0,0
0.67,Second,1,2,1,1,14.5000,0,0
...,...,...,...,...,...,...,...,...
74.00,Second,0,0,0,0,0.0000,0,0
74.00,Third,0,3,0,0,7.7750,1,1
80.00,First,1,1,0,0,30.0000,1,1
80.00,Second,0,0,0,0,0.0000,0,0


In [52]:
data.groupby(['age' ,'class']).describe()

Unnamed: 0_level_0,Unnamed: 1_level_0,survived,survived,survived,survived,survived,survived,survived,survived,pclass,pclass,...,parch,parch,fare,fare,fare,fare,fare,fare,fare,fare
Unnamed: 0_level_1,Unnamed: 1_level_1,count,mean,std,min,25%,50%,75%,max,count,mean,...,75%,max,count,mean,std,min,25%,50%,75%,max
age,class,Unnamed: 2_level_2,Unnamed: 3_level_2,Unnamed: 4_level_2,Unnamed: 5_level_2,Unnamed: 6_level_2,Unnamed: 7_level_2,Unnamed: 8_level_2,Unnamed: 9_level_2,Unnamed: 10_level_2,Unnamed: 11_level_2,Unnamed: 12_level_2,Unnamed: 13_level_2,Unnamed: 14_level_2,Unnamed: 15_level_2,Unnamed: 16_level_2,Unnamed: 17_level_2,Unnamed: 18_level_2,Unnamed: 19_level_2,Unnamed: 20_level_2,Unnamed: 21_level_2,Unnamed: 22_level_2
0.42,Third,1.0,1.0,,1.0,1.0,1.0,1.0,1.0,1.0,3.0,...,1.00,1.0,1.0,8.5167,,8.5167,8.5167,8.5167,8.5167,8.5167
0.67,Second,1.0,1.0,,1.0,1.0,1.0,1.0,1.0,1.0,2.0,...,1.00,1.0,1.0,14.5000,,14.5000,14.5000,14.5000,14.5000,14.5000
0.75,Third,2.0,1.0,0.0,1.0,1.0,1.0,1.0,1.0,2.0,3.0,...,1.00,1.0,2.0,19.2583,0.000000,19.2583,19.2583,19.2583,19.2583,19.2583
0.83,Second,2.0,1.0,0.0,1.0,1.0,1.0,1.0,1.0,2.0,2.0,...,1.75,2.0,2.0,23.8750,7.247845,18.7500,21.3125,23.8750,26.4375,29.0000
0.92,First,1.0,1.0,,1.0,1.0,1.0,1.0,1.0,1.0,1.0,...,2.00,2.0,1.0,151.5500,,151.5500,151.5500,151.5500,151.5500,151.5500
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
70.00,Second,1.0,0.0,,0.0,0.0,0.0,0.0,0.0,1.0,2.0,...,0.00,0.0,1.0,10.5000,,10.5000,10.5000,10.5000,10.5000,10.5000
70.50,Third,1.0,0.0,,0.0,0.0,0.0,0.0,0.0,1.0,3.0,...,0.00,0.0,1.0,7.7500,,7.7500,7.7500,7.7500,7.7500,7.7500
71.00,First,2.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,2.0,1.0,...,0.00,0.0,2.0,42.0792,10.500536,34.6542,38.3667,42.0792,45.7917,49.5042
74.00,Third,1.0,0.0,,0.0,0.0,0.0,0.0,0.0,1.0,3.0,...,0.00,0.0,1.0,7.7750,,7.7750,7.7750,7.7750,7.7750,7.7750


### Aggregate, filter, transform

#### Aggregation.
#### We’re now familiar with GroupBy aggregations with sum(), median(), and the like, but the aggregate() method allows for even more flexibility.

In [53]:
rng = np.random.RandomState(0)
df = pd.DataFrame({'key': ['A', 'B', 'C', 'A', 'B', 'C'],
 'data1': range(6),
 'data2': rng.randint(0, 10, 6)},
 columns = ['key', 'data1', 'data2'])
df


Unnamed: 0,key,data1,data2
0,A,0,5
1,B,1,0
2,C,2,3
3,A,3,3
4,B,4,7
5,C,5,9


In [54]:
 df.groupby('key').aggregate(['min', np.median, max])

Unnamed: 0_level_0,data1,data1,data1,data2,data2,data2
Unnamed: 0_level_1,min,median,max,min,median,max
key,Unnamed: 1_level_2,Unnamed: 2_level_2,Unnamed: 3_level_2,Unnamed: 4_level_2,Unnamed: 5_level_2,Unnamed: 6_level_2
A,0,1.5,3,3,4.0,5
B,1,2.5,4,0,3.5,7
C,2,3.5,5,3,6.0,9


### Filtering. A filtering operation allows you to drop data based on the group properties.

In [55]:
#For example, we might want to keep all groups in which the standard deviation is larger than some critical value:

def filter_func(x):
 return x['data2'].std() > 4
print(df) 
print(df.groupby('key').std())
print(df.groupby('key').filter(filter_func))

  key  data1  data2
0   A      0      5
1   B      1      0
2   C      2      3
3   A      3      3
4   B      4      7
5   C      5      9
       data1     data2
key                   
A    2.12132  1.414214
B    2.12132  4.949747
C    2.12132  4.242641
  key  data1  data2
1   B      1      0
2   C      2      3
4   B      4      7
5   C      5      9


### Transformation: While aggregation must return a reduced version of the data, transformation can return some transformed version of the full data to recombine.

In [56]:
df.groupby('key').transform(lambda x: x - x.mean())

Unnamed: 0,data1,data2
0,-1.5,1.0
1,-1.5,-3.5
2,-1.5,-3.0
3,1.5,-1.0
4,1.5,3.5
5,1.5,3.0


### Pivot Tables

#### The pivot table takes simple columnwise data as input, and groups the entries into a two-dimensional table that provides a multidimensional summarization of the data

In [57]:
import numpy as np
import pandas as pd
import seaborn as sns
titanic = sns.load_dataset('titanic')
titanic.head()

Unnamed: 0,survived,pclass,sex,age,sibsp,parch,fare,embarked,class,who,adult_male,deck,embark_town,alive,alone
0,0,3,male,22.0,1,0,7.25,S,Third,man,True,,Southampton,no,False
1,1,1,female,38.0,1,0,71.2833,C,First,woman,False,C,Cherbourg,yes,False
2,1,3,female,26.0,0,0,7.925,S,Third,woman,False,,Southampton,yes,True
3,1,1,female,35.0,1,0,53.1,S,First,woman,False,C,Southampton,yes,False
4,0,3,male,35.0,0,0,8.05,S,Third,man,True,,Southampton,no,True


In [58]:
titanic.pivot_table('survived', index='sex', columns='class')

class,First,Second,Third
sex,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
female,0.968085,0.921053,0.5
male,0.368852,0.157407,0.135447


In [59]:
# Multilevel pivot tables
#Just as in the GroupBy, the grouping in pivot tables can be specified with multiple levels, and via a number of options.
age = pd.cut(titanic['age'], [0, 18, 80])
titanic.pivot_table('survived', ['sex', age], 'class')

Unnamed: 0_level_0,class,First,Second,Third
sex,age,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1
female,"(0, 18]",0.909091,1.0,0.511628
female,"(18, 80]",0.972973,0.9,0.423729
male,"(0, 18]",0.8,0.6,0.215686
male,"(18, 80]",0.375,0.071429,0.133663


### Introducing Pandas String Operations
One strength of Python is its relative ease in handling and manipulating string data.

In [60]:
d1 = pd.Series(data = ['peter', 'Paul', None, 'MARY', 'gUIDO'])
d1

0    peter
1     Paul
2     None
3     MARY
4    gUIDO
dtype: object

In [61]:
# We can now call a single method that will capitalize all the entries, while skipping over any missing values:
d1.str.capitalize()  

0    Peter
1     Paul
2     None
3     Mary
4    Guido
dtype: object

### Methods similar to Python string methods
Nearly all Python’s built-in string methods are mirrored by a Pandas vectorized string
method. Here is a list of Pandas str methods that mirror Python string methods:

![string_methods](string_methods.PNG)

In [62]:
# for string len calcutaion
d1.str.len()

0    5.0
1    4.0
2    NaN
3    4.0
4    5.0
dtype: float64

In [63]:
d1.str.startswith('T')


0    False
1    False
2     None
3    False
4    False
dtype: object

### Dates and Times in Python
The Python world has a number of available representations of dates, times, deltas,
and timespans.

Native Python dates and times: datetime and dateutil
Python’s basic objects for working with dates and times reside in the built-in date
time module. Along with the third-party dateutil module, you can use it to quickly
perform a host of useful functionalities on dates and times. For example, you can
manually build a date using the datetime type:


In [64]:
from datetime import datetime
print(datetime(year=2015, month=7, day=4))

# Or, using the dateutil module, you can parse dates from a variety of string formats:
from dateutil import parser
date = parser.parse("4th of July, 2015")
print(date)

# Once you have a datetime object, you can do things like printing the day of the week:
print(date.strftime('%A'))


2015-07-04 00:00:00
2015-07-04 00:00:00
Saturday


Typed arrays of times: NumPy’s datetime64


In [65]:
import numpy as np
date = np.array('2015-07-04', dtype=np.datetime64)
date


array('2015-07-04', dtype='datetime64[D]')

In [66]:
np.datetime64('2015-07-04 12:00')

numpy.datetime64('2015-07-04T12:00')

#### Description of date and time codes

![dat_time](date_time.PNG)


![date_time1](date_time1.PNG)

### Dates and times in Pandas: Best of both worlds

In [67]:
import pandas as pd
date = pd.to_datetime("4th of July, 2015")
date

Timestamp('2015-07-04 00:00:00')

In [68]:
# specify data and time by using pd.range function
data = pd.date_range('2012-2-2',  periods=10)
data

DatetimeIndex(['2012-02-02', '2012-02-03', '2012-02-04', '2012-02-05',
               '2012-02-06', '2012-02-07', '2012-02-08', '2012-02-09',
               '2012-02-10', '2012-02-11'],
              dtype='datetime64[ns]', freq='D')

### Pandas Time Series: Indexing by Time
Where the Pandas time series tools really become useful is when you begin to index
data by timestamps.


In [69]:
index = pd.DatetimeIndex(['2014-07-04', '2014-08-04',
 '2015-07-04', '2015-08-04'])
data = pd.Series([0, 1, 2, 3], index=index)
data


2014-07-04    0
2014-08-04    1
2015-07-04    2
2015-08-04    3
dtype: int64

Listing of Pandas frequency codes

![frequecy_code](frequecy_code.PNG)

This is not the end. Data Manipulation with Pandas comes with endless oppoetunites to analysis of data.
Keep paracting and get deep and deeper in pandas to master it.
THANK YOU!!