## Series

A series is a one dimensional data structure. Series actually contains two arrays, one is column of data and the other contains index.

In [1]:
import pandas as pd
simple_series = pd.Series([1,2,3,4,5])
indexed_series = pd.Series([1,2,3,4,5], index = ['a', 'b', 'c', 'd', 'e'])
print(simple_series)
print(indexed_series)

0    1
1    2
2    3
3    4
4    5
dtype: int64
a    1
b    2
c    3
d    4
e    5
dtype: int64


In [4]:
print(indexed_series.values)
print(indexed_series.index)

[1 2 3 4 5]
Index(['a', 'b', 'c', 'd', 'e'], dtype='object')


Another way of creating an indexed series is to pass in a dictionary

In [5]:
another_series = pd.Series({'a': 1, 'b': 2, 'c': 3})
print(another_series)

a    1
b    2
c    3
dtype: int64


### Accessing Elements of a Series

In [6]:
indexed_series['b']

2

In [7]:
indexed_series[1]

2

In [8]:
indexed_series[:2]

a    1
b    2
dtype: int64

In [9]:
indexed_series[['a', 'b', 'c']]

a    1
b    2
c    3
dtype: int64

In [16]:
# We can also pass in boolean expression inside [], essesntially
# we pass a series of boolean values. Here indexed_series > 1
# produces a series of boolean values
indexed_series[indexed_series > 1]

b    2
c    3
d    4
e    5
dtype: int64

### Some Operations on Series

In [11]:
# List all the unique values in a series
indexed_series.unique()

array([1, 2, 3, 4, 5], dtype=int64)

In [12]:
# Count of all values in a series
a_series = pd.Series([1,1,2,2,2,3,4,5,6,6,6,6])
a_series.value_counts()

6    4
2    3
1    2
5    1
4    1
3    1
dtype: int64

In [15]:
# Which all values are null/nan
import numpy as np
series_with_null = pd.Series([np.nan, 1, 2, np.nan])
series_with_null.isnull()

0     True
1    False
2    False
3     True
dtype: bool

In [19]:
series_without_null = series_with_null[series_with_null.notnull()]
print(series_without_null)

1    1.0
2    2.0
dtype: float64


## DataFrame
A DataFrame represents 2d data similar to an excel sheet. Like Series, DataFrame also has index. It additionally has column labels. We can create a dataframe in a number of ways.

### Creating DataFrame

In [21]:
simple_df = pd.DataFrame({
    'A': [1,2,3,4,5],    # note that in this case dict key represents a column
    'B': [6,7,8,9,10]
})
print(simple_df)

   A   B
0  1   6
1  2   7
2  3   8
3  4   9
4  5  10


In [24]:
indexed_df = pd.DataFrame({
    'A': [1,2,3,4,5],    
    'B': [6,7,8,9,10]
}, index = ['a','b','c','d','e'])
print(indexed_df)

   A   B
a  1   6
b  2   7
c  3   8
d  4   9
e  5  10


In [25]:
# Creating dataframe using a nested dictionary
new_df = pd.DataFrame({
    'red': {2012:1, 2013:2},
    'white': {2011:5, 2012:3, 2013:3},
    'blue': {2011:4, 2015:1}
})
print(new_df)

      red  white  blue
2012  1.0    3.0   NaN
2013  2.0    3.0   NaN
2011  NaN    5.0   4.0
2015  NaN    NaN   1.0


In [26]:
# Creating DataFrame using Numpy array
array_df = pd.DataFrame(np.arange(16).reshape(4,4),
                       index=['a','b','c','d'],
                       columns=['A','B','C','D'])
print(array_df)

    A   B   C   D
a   0   1   2   3
b   4   5   6   7
c   8   9  10  11
d  12  13  14  15


In [2]:
# CSV To DataFrame
pokemon_df = pd.read_csv('Data/pokemon.csv')

## Viewing and Selecting Data from DataFrame

In [32]:
pokemon_df.head()

Unnamed: 0,#,Name,Type 1,Type 2,Total,HP,Attack,Defense,Sp. Atk,Sp. Def,Speed,Generation,Legendary
0,1,Bulbasaur,Grass,Poison,318,45,49,49,65,65,45,1,False
1,2,Ivysaur,Grass,Poison,405,60,62,63,80,80,60,1,False
2,3,Venusaur,Grass,Poison,525,80,82,83,100,100,80,1,False
3,3,VenusaurMega Venusaur,Grass,Poison,625,80,100,123,122,120,80,1,False
4,4,Charmander,Fire,,309,39,52,43,60,50,65,1,False


In [34]:
pokemon_df['Name'][1]

'Ivysaur'

In [35]:
pokemon_df[:3]    # Just writing pokemon_df[3] wouldn't have worked

Unnamed: 0,#,Name,Type 1,Type 2,Total,HP,Attack,Defense,Sp. Atk,Sp. Def,Speed,Generation,Legendary
0,1,Bulbasaur,Grass,Poison,318,45,49,49,65,65,45,1,False
1,2,Ivysaur,Grass,Poison,405,60,62,63,80,80,60,1,False
2,3,Venusaur,Grass,Poison,525,80,82,83,100,100,80,1,False


In [36]:
hp_series = pokemon_df['HP']
type(hp_series)

pandas.core.series.Series

In [37]:
# Selecting only a few columns
name_hp_df = pokemon_df[['Name', 'HP']]
name_hp_df.head(3)

Unnamed: 0,Name,HP
0,Bulbasaur,45
1,Ivysaur,60
2,Venusaur,80


In [39]:
# To search by index,
pokemon_df.loc[5]

#                      5
Name          Charmeleon
Type 1              Fire
Type 2               NaN
Total                405
HP                    58
Attack                64
Defense               58
Sp. Atk               80
Sp. Def               65
Speed                 80
Generation             1
Legendary          False
Name: 5, dtype: object

In [40]:
type(pokemon_df.loc[5])

pandas.core.series.Series

In [42]:
# However, in this case it is better if we set string index
# Many Pandas functions accept inplace as an argument which makes
# changes to the current series/dataframe instead of returning new one
pokemon_df.set_index(pokemon_df['Name'], inplace = True)
pokemon_df.loc['Pikachu']

#                   25
Name           Pikachu
Type 1        Electric
Type 2             NaN
Total              320
HP                  35
Attack              55
Defense             40
Sp. Atk             50
Sp. Def             50
Speed               90
Generation           1
Legendary        False
Name: Pikachu, dtype: object

In [43]:
# If we still want to use numerical index, we can use iloc function
pokemon_df.iloc[25]

#                   20
Name          Raticate
Type 1          Normal
Type 2             NaN
Total              413
HP                  55
Attack              81
Defense             60
Sp. Atk             50
Sp. Def             70
Speed               97
Generation           1
Legendary        False
Name: Raticate, dtype: object

In [44]:
pokemon_df.loc['Ivysaur':'Venusaur']

Unnamed: 0_level_0,#,Name,Type 1,Type 2,Total,HP,Attack,Defense,Sp. Atk,Sp. Def,Speed,Generation,Legendary
Name,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1
Ivysaur,2,Ivysaur,Grass,Poison,405,60,62,63,80,80,60,1,False
Venusaur,3,Venusaur,Grass,Poison,525,80,82,83,100,100,80,1,False


In [45]:
pokemon_df.iloc[[11,12,13]]

Unnamed: 0_level_0,#,Name,Type 1,Type 2,Total,HP,Attack,Defense,Sp. Atk,Sp. Def,Speed,Generation,Legendary
Name,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1
Blastoise,9,Blastoise,Water,,530,79,83,100,85,105,78,1,False
BlastoiseMega Blastoise,9,BlastoiseMega Blastoise,Water,,630,79,103,120,135,115,78,1,False
Caterpie,10,Caterpie,Bug,,195,45,30,35,20,20,45,1,False


### Conditional Selection

In [47]:
pokemon_df[pokemon_df['HP'] > 50].head()

Unnamed: 0_level_0,#,Name,Type 1,Type 2,Total,HP,Attack,Defense,Sp. Atk,Sp. Def,Speed,Generation,Legendary
Name,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1
Ivysaur,2,Ivysaur,Grass,Poison,405,60,62,63,80,80,60,1,False
Venusaur,3,Venusaur,Grass,Poison,525,80,82,83,100,100,80,1,False
VenusaurMega Venusaur,3,VenusaurMega Venusaur,Grass,Poison,625,80,100,123,122,120,80,1,False
Charmeleon,5,Charmeleon,Fire,,405,58,64,58,80,65,80,1,False
Charizard,6,Charizard,Fire,Flying,534,78,84,78,109,85,100,1,False


In [48]:
pokemon_df[pokemon_df['Type 2'].isin(['Rock', 'Electric'])].head()

Unnamed: 0_level_0,#,Name,Type 1,Type 2,Total,HP,Attack,Defense,Sp. Atk,Sp. Def,Speed,Generation,Legendary
Name,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1
Rhyhorn,111,Rhyhorn,Ground,Rock,345,80,85,95,30,30,25,1,False
Rhydon,112,Rhydon,Ground,Rock,485,105,130,120,45,45,40,1,False
Chinchou,170,Chinchou,Water,Electric,330,75,38,38,56,56,67,2,False
Lanturn,171,Lanturn,Water,Electric,460,125,58,58,76,76,67,2,False
Shuckle,213,Shuckle,Bug,Rock,505,20,10,230,10,230,5,2,False


In [70]:
pokemon_df[pokemon_df['Type 1'].str.startswith('F')].head()

Unnamed: 0_level_0,#,Name,Type 1,Type 2,Total,HP,Attack,Defense,Sp. Atk,Sp. Def,Speed,Generation,Legendary
Name,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1
Charmander,4,Charmander,Fire,,309,39,52,43,60,50,65,1,False
Charmeleon,5,Charmeleon,Fire,,405,58,64,58,80,65,80,1,False
Charizard,6,Charizard,Fire,Flying,534,78,84,78,109,85,100,1,False
CharizardMega Charizard X,6,CharizardMega Charizard X,Fire,Dragon,634,78,130,111,130,85,100,1,False
CharizardMega Charizard Y,6,CharizardMega Charizard Y,Fire,Flying,634,78,104,78,159,115,100,1,False


In [71]:
pokemon_df[~pokemon_df['Type 1'].str.startswith('F')].head()

Unnamed: 0_level_0,#,Name,Type 1,Type 2,Total,HP,Attack,Defense,Sp. Atk,Sp. Def,Speed,Generation,Legendary
Name,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1
Bulbasaur,1,Bulbasaur,Grass,Poison,318,45,49,49,65,65,45,1,False
Ivysaur,2,Ivysaur,Grass,Poison,405,60,62,63,80,80,60,1,False
Venusaur,3,Venusaur,Grass,Poison,525,80,82,83,100,100,80,1,False
VenusaurMega Venusaur,3,VenusaurMega Venusaur,Grass,Poison,625,80,100,123,122,120,80,1,False
Squirtle,7,Squirtle,Water,,314,44,48,65,50,64,43,1,False


In [72]:
pokemon_df[pokemon_df['Name'].str.contains('chu')].head()

Unnamed: 0_level_0,#,Name,Type 1,Type 2,Total,HP,Attack,Defense,Sp. Atk,Sp. Def,Speed,Generation,Legendary
Name,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1
Pikachu,25,Pikachu,Electric,,320,35,55,40,50,50,90,1,False
Raichu,26,Raichu,Electric,,485,60,90,55,90,80,110,1,False
Pichu,172,Pichu,Electric,,205,20,40,15,35,35,60,2,False
Smoochum,238,Smoochum,Ice,Psychic,305,45,30,15,85,65,65,2,False


### Adding and Removing Data

In [49]:
df = pd.DataFrame([[5, 6], [1.2, 3]])
ser = pd.Series([0, 0], name='r3')

df_app = df.append(ser)
print(df_app)

      0  1
0   5.0  6
1   1.2  3
r3  0.0  0


In [50]:
df = pd.DataFrame({'c1': [1, 2], 'c2': [3, 4],
                   'c3': [5, 6]},
                  index=['r1', 'r2'])
# Drop row r1
df_drop = df.drop(labels='r1')
print('{}\n'.format(df_drop))

# Drop columns c1, c3
df_drop = df.drop(labels=['c1', 'c3'], axis=1)
print('{}\n'.format(df_drop))

    c1  c2  c3
r2   2   4   6

    c2
r1   3
r2   4



The above functions return a new DataFrame, not delete from the original.

In [3]:
# Remove duplicates
pokemon_df.drop_duplicates(inplace=True)

In [4]:
# To add a column
pokemon_df['Active'] = True
pokemon_df.head()

Unnamed: 0,#,Name,Type 1,Type 2,Total,HP,Attack,Defense,Sp. Atk,Sp. Def,Speed,Generation,Legendary,Active
0,1,Bulbasaur,Grass,Poison,318,45,49,49,65,65,45,1,False,True
1,2,Ivysaur,Grass,Poison,405,60,62,63,80,80,60,1,False,True
2,3,Venusaur,Grass,Poison,525,80,82,83,100,100,80,1,False,True
3,3,VenusaurMega Venusaur,Grass,Poison,625,80,100,123,122,120,80,1,False,True
4,4,Charmander,Fire,,309,39,52,43,60,50,65,1,False,True


In [5]:
# To remove a column
del pokemon_df['Active']

### Combining DataFrames

In [54]:
df1 = pd.DataFrame({'c1':[1,2], 'c2':[3,4]},
                   index=['r1','r2'])
df2 = pd.DataFrame({'c1':[5,6], 'c2':[7,8]},
                   index=['r1','r2'])
df3 = pd.DataFrame({'c1':[5,6], 'c2':[7,8]})
pd.concat([df2, df1, df3])

Unnamed: 0,c1,c2
r1,5,7
r2,6,8
r1,1,3
r2,2,4
0,5,7
1,6,8


In [55]:
# We can also concat along columns
pd.concat([df2, df1, df3], axis=1)

Unnamed: 0,c1,c2,c1.1,c2.1,c1.2,c2.2
r1,5.0,7.0,1.0,3.0,,
r2,6.0,8.0,2.0,4.0,,
0,,,,,5.0,7.0
1,,,,,6.0,8.0


There is another function called `merge` which can be used to combine two dataframes.

### Grouping

In [57]:
groups = pokemon_df.groupby('Type 1')
type(groups)

pandas.core.groupby.generic.DataFrameGroupBy

In [61]:
for name, group in groups:
    print(f'Name: {name}')

Name: Bug
Name: Dark
Name: Dragon
Name: Electric
Name: Fairy
Name: Fighting
Name: Fire
Name: Flying
Name: Ghost
Name: Grass
Name: Ground
Name: Ice
Name: Normal
Name: Poison
Name: Psychic
Name: Rock
Name: Steel
Name: Water


In [63]:
groups.get_group('Ghost').head()

Unnamed: 0_level_0,#,Name,Type 2,Total,HP,Attack,Defense,Sp. Atk,Sp. Def,Speed,Generation,Legendary
Name,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1
Gastly,92,Gastly,Poison,310,30,35,30,100,35,80,1,False
Haunter,93,Haunter,Poison,405,45,50,45,115,55,95,1,False
Gengar,94,Gengar,Poison,500,60,65,60,130,75,110,1,False
GengarMega Gengar,94,GengarMega Gengar,Poison,600,60,65,80,170,95,130,1,False
Misdreavus,200,Misdreavus,,435,60,60,60,85,85,85,2,False


In [65]:
groups.sum()[['HP', 'Attack', 'Defense', 'Speed']]

Unnamed: 0_level_0,HP,Attack,Defense,Speed
Type 1,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1
Bug,3925,4897,4880,4256
Dark,2071,2740,2177,2361
Dragon,2666,3588,2764,2657
Electric,2631,3040,2917,3718
Fairy,1260,1046,1117,826
Fighting,1886,2613,1780,1784
Fire,3635,4408,3524,3871
Flying,283,315,265,410
Ghost,2062,2361,2598,2059
Grass,4709,5125,4956,4335


In [67]:
groups.count()['#']

Type 1
Bug          69
Dark         31
Dragon       32
Electric     44
Fairy        17
Fighting     27
Fire         52
Flying        4
Ghost        32
Grass        70
Ground       32
Ice          24
Normal       98
Poison       28
Psychic      57
Rock         44
Steel        27
Water       112
Name: #, dtype: int64

### Sorting
We can sort index, columns, or even column order

In [6]:
import numpy as np
frame = pd.DataFrame(np.random.randint(1,100,(4,4)),
                    index = ['red','blue','yellow','white'],
                    columns = ['ball','pen','paper','pencil'])
frame

Unnamed: 0,ball,pen,paper,pencil
red,67,27,27,3
blue,8,71,45,15
yellow,1,6,64,43
white,49,14,78,84


In [7]:
frame.sort_index()

Unnamed: 0,ball,pen,paper,pencil
blue,8,71,45,15
red,67,27,27,3
white,49,14,78,84
yellow,1,6,64,43


In [8]:
frame.sort_index(axis=1, ascending=False)

Unnamed: 0,pencil,pen,paper,ball
red,3,27,27,67
blue,15,71,45,8
yellow,43,6,64,1
white,84,14,78,49


In [10]:
frame.sort_values(by='pencil')

Unnamed: 0,ball,pen,paper,pencil
red,67,27,27,3
blue,8,71,45,15
yellow,1,6,64,43
white,49,14,78,84
