# <font color=hotpink> Python Pandas </font>

* Pandas is a Python library that provides utilities to deal with structured data stored in the form of rows and columns.
* According to Wikipedia page on Pandas, the name is derived from the term "*panel data*", term used for multidimensional structured data sets.
* Robust toolkit for analyzing, filtering, manipulating, aggregating, merging, pivoting, and cleaning data.
* Can be called as "*Excel for Python*" or "*Excel on Steroids*", metaphorically.

In [1]:
import pandas as pd
import numpy as np

In [2]:
pd.__version__

'1.5.2'

In [3]:
np.__version__

'1.24.0'

## <font color=fe7401> Series </font>

* One-dimensional ndarray holding data of any type.
* Series combine both list and dictionary Python data-structure.
* It allows to store data in order like list, but also allows to assign identifier like dictionary.
* By default Pandas assign identifier/key as index like 0, 1, 2 ... to the data.
* Difference between Pandas series and dictionary is that series can have duplicate keys. 


In [4]:
# creation of series

ice_cream = ['Vanilla', 'Chocolate', 'Strawberry']
pd.Series(ice_cream)

0       Vanilla
1     Chocolate
2    Strawberry
dtype: object

In [5]:
# series of int data type

pd.Series([11, 22, 33, 44])

0    11
1    22
2    33
3    44
dtype: int64

In [6]:
# series of heterogeneous dtype

pd.Series(["Apple", 12, 99.56, True])

0    Apple
1       12
2    99.56
3     True
dtype: object

In [7]:
# series from dictionary

runs_score = {
    "Sachin": 99,
    "Virat": 56,
    "Brown": 89
}

pd.Series(runs_score)

Sachin    99
Virat     56
Brown     89
dtype: int64

In [8]:
# parameter - the name we give to expected input
# argument - concrete value we provide to a parameter

# Difficuty (param) - Easy, Medium, Hard (argument)
# pd.Series(data = dataSrc, index = idx)
# param: data, index
# arg: dataSrc, idx

In [9]:
# duplicate keys allowed in series

weekdays = ['Monday', 'Tuesday', 'Wednesday', 'Thursday', 'Friday']
fruits = ['Apple', 'Mango', 'Kiwi', 'Strawberry', 'Apple']

pd.Series(data = weekdays, index = fruits)   # keyword args

Apple            Monday
Mango           Tuesday
Kiwi          Wednesday
Strawberry     Thursday
Apple            Friday
dtype: object

### <font color=blue> Methods on Series </font>

In [10]:
prices = pd.Series([12, 23.56, 8.5])
prices

0    12.00
1    23.56
2     8.50
dtype: float64

In [11]:
prices.sum()

44.06

In [12]:
prices.product()

2403.12

In [13]:
prices.mean()

14.686666666666667

In [14]:
prices.isnull()

0    False
1    False
2    False
dtype: bool

### <font color=blue> Attributes on Series </font>

* Driving a car is action/method, whereas car's color is detail/attribute

In [15]:
adjectives = pd.Series(["Good", "Handsome", "Smart", "Tall"])
adjectives

0        Good
1    Handsome
2       Smart
3        Tall
dtype: object

In [16]:
adjectives.size

4

In [17]:
adjectives.is_unique

True

In [18]:
adjectives.values

array(['Good', 'Handsome', 'Smart', 'Tall'], dtype=object)

In [19]:
type(adjectives.values)

numpy.ndarray

In [20]:
adjectives.index

RangeIndex(start=0, stop=4, step=1)

In [21]:
type(adjectives.index)

pandas.core.indexes.range.RangeIndex

In [22]:
adjectives.dtype

dtype('O')

### <font color=blue> Import Series with pd.read_csv() </font>

* pd.read_csv() returns DataFrame object.
* Dataframe visualize as alternate colors in rows, whereas Series is a plain black & white text in the output of a cell.

In [23]:
pd.read_csv("./datasets/pokemon.csv")  

Unnamed: 0,Pokemon,Type
0,Bulbasaur,Grass
1,Ivysaur,Grass
2,Venusaur,Grass
3,Charmander,Fire
4,Charmeleon,Fire
...,...,...
716,Yveltal,Dark
717,Zygarde,Dragon
718,Diancie,Rock
719,Hoopa,Psychic


In [24]:
# to get the series ie. 1-d array

pokemon = pd.read_csv("./datasets/pokemon.csv", usecols=["Pokemon"]).squeeze("columns")
pokemon

0       Bulbasaur
1         Ivysaur
2        Venusaur
3      Charmander
4      Charmeleon
          ...    
716       Yveltal
717       Zygarde
718       Diancie
719         Hoopa
720     Volcanion
Name: Pokemon, Length: 721, dtype: object

### <font color=blue> head and tail method on series </font>

In [25]:
pokemon.head()

0     Bulbasaur
1       Ivysaur
2      Venusaur
3    Charmander
4    Charmeleon
Name: Pokemon, dtype: object

In [26]:
pokemon.head(2)

0    Bulbasaur
1      Ivysaur
Name: Pokemon, dtype: object

In [27]:
pokemon.tail()

716      Yveltal
717      Zygarde
718      Diancie
719        Hoopa
720    Volcanion
Name: Pokemon, dtype: object

### <font color=blue> Views versus Copies </font>

* Some operations in pandas (and numpy as well) will return views of the original data, while other copies. 
* To put it very simply, a view is a subset of the original object ( DataFrame or Series ) linked to the original source, while a copy is an entirely new object .

### <font color=blue>Passing Series to Python built-in functions </font>

In [28]:
len(pokemon)

721

In [29]:
sorted(pokemon)

['Abomasnow',
 'Abra',
 'Absol',
 'Accelgor',
 'Aegislash',
 'Aerodactyl',
 'Aggron',
 'Aipom',
 'Alakazam',
 'Alomomola',
 'Altaria',
 'Amaura',
 'Ambipom',
 'Amoonguss',
 'Ampharos',
 'Anorith',
 'Arbok',
 'Arcanine',
 'Arceus',
 'Archen',
 'Archeops',
 'Ariados',
 'Armaldo',
 'Aromatisse',
 'Aron',
 'Articuno',
 'Audino',
 'Aurorus',
 'Avalugg',
 'Axew',
 'Azelf',
 'Azumarill',
 'Azurill',
 'Bagon',
 'Baltoy',
 'Banette',
 'Barbaracle',
 'Barboach',
 'Basculin',
 'Bastiodon',
 'Bayleef',
 'Beartic',
 'Beautifly',
 'Beedrill',
 'Beheeyem',
 'Beldum',
 'Bellossom',
 'Bellsprout',
 'Bergmite',
 'Bibarel',
 'Bidoof',
 'Binacle',
 'Bisharp',
 'Blastoise',
 'Blaziken',
 'Blissey',
 'Blitzle',
 'Boldore',
 'Bonsly',
 'Bouffalant',
 'Braixen',
 'Braviary',
 'Breloom',
 'Bronzong',
 'Bronzor',
 'Budew',
 'Buizel',
 'Bulbasaur',
 'Buneary',
 'Bunnelby',
 'Burmy',
 'Butterfree',
 'Cacnea',
 'Cacturne',
 'Camerupt',
 'Carbink',
 'Carnivine',
 'Carracosta',
 'Carvanha',
 'Cascoon',
 'Castform',


In [30]:
max(pokemon)

'Zygarde'

In [31]:
dict(pokemon)

{0: 'Bulbasaur',
 1: 'Ivysaur',
 2: 'Venusaur',
 3: 'Charmander',
 4: 'Charmeleon',
 5: 'Charizard',
 6: 'Squirtle',
 7: 'Wartortle',
 8: 'Blastoise',
 9: 'Caterpie',
 10: 'Metapod',
 11: 'Butterfree',
 12: 'Weedle',
 13: 'Kakuna',
 14: 'Beedrill',
 15: 'Pidgey',
 16: 'Pidgeotto',
 17: 'Pidgeot',
 18: 'Rattata',
 19: 'Raticate',
 20: 'Spearow',
 21: 'Fearow',
 22: 'Ekans',
 23: 'Arbok',
 24: 'Pikachu',
 25: 'Raichu',
 26: 'Sandshrew',
 27: 'Sandslash',
 28: 'Nidoran',
 29: 'Nidorina',
 30: 'Nidoqueen',
 31: 'Nidoran♂',
 32: 'Nidorino',
 33: 'Nidoking',
 34: 'Clefairy',
 35: 'Clefable',
 36: 'Vulpix',
 37: 'Ninetales',
 38: 'Jigglypuff',
 39: 'Wigglytuff',
 40: 'Zubat',
 41: 'Golbat',
 42: 'Oddish',
 43: 'Gloom',
 44: 'Vileplume',
 45: 'Paras',
 46: 'Parasect',
 47: 'Venonat',
 48: 'Venomoth',
 49: 'Diglett',
 50: 'Dugtrio',
 51: 'Meowth',
 52: 'Persian',
 53: 'Psyduck',
 54: 'Golduck',
 55: 'Mankey',
 56: 'Primeape',
 57: 'Growlithe',
 58: 'Arcanine',
 59: 'Poliwag',
 60: 'Poliwhirl',


In [32]:
type(pokemon)

pandas.core.series.Series

### <font color=blue> sort_values method </font>

In [33]:
pokemon.sort_values().head()

459    Abomasnow
62          Abra
358        Absol
616     Accelgor
680    Aegislash
Name: Pokemon, dtype: object

In [34]:
pokemon.sort_values(ascending = False).head()

717     Zygarde
633    Zweilous
40        Zubat
569       Zorua
570     Zoroark
Name: Pokemon, dtype: object

### <font color=blue> sort_index method </font>

In [35]:
poke_idx = pd.read_csv("./datasets/pokemon.csv", index_col = "Type").squeeze("columns")
poke_idx

Type
Grass       Bulbasaur
Grass         Ivysaur
Grass        Venusaur
Fire       Charmander
Fire       Charmeleon
              ...    
Dark          Yveltal
Dragon        Zygarde
Rock          Diancie
Psychic         Hoopa
Fire        Volcanion
Name: Pokemon, Length: 721, dtype: object

In [36]:
poke_idx.sort_index()

Type
Bug        Leavanny
Bug           Burmy
Bug       Scolipede
Bug        Genesect
Bug      Kricketune
            ...    
Water       Panpour
Water          Seel
Water       Dewgong
Water      Politoed
Water      Chinchou
Name: Pokemon, Length: 721, dtype: object

### <font color=blue> Check for inclusion using 'in' keyword </font>

* By default Pandas will look with index while using the 'in' keyword.

In [37]:
pokemon.head()

0     Bulbasaur
1       Ivysaur
2      Venusaur
3    Charmander
4    Charmeleon
Name: Pokemon, dtype: object

In [38]:
'Bulbasaur' in pokemon

False

In [39]:
4 in pokemon

True

In [40]:
4 in pokemon.index

True

In [41]:
'Bulbasaur' in pokemon.values

True

### <font color=blue> Extract Series value by Indexing </font>

* Negative indexing doesn't work in Pandas like `sr[-1]`, `sr[-20]`, but negative slicig is possible like `sr[-20 : -10]`

In [42]:
pokemon.head()

0     Bulbasaur
1       Ivysaur
2      Venusaur
3    Charmander
4    Charmeleon
Name: Pokemon, dtype: object

In [43]:
pokemon[3]

'Charmander'

In [44]:
pokemon[:2]

0    Bulbasaur
1      Ivysaur
Name: Pokemon, dtype: object

In [45]:
pokemon[: 4 : 2]

0    Bulbasaur
2     Venusaur
Name: Pokemon, dtype: object

In [46]:
# pokemon[-1] error

pokemon[-5 : -1]

716    Yveltal
717    Zygarde
718    Diancie
719      Hoopa
Name: Pokemon, dtype: object

In [47]:
pokemon[-5 :]

716      Yveltal
717      Zygarde
718      Diancie
719        Hoopa
720    Volcanion
Name: Pokemon, dtype: object

In [48]:
pokemon[[1, 2, 720]]

1        Ivysaur
2       Venusaur
720    Volcanion
Name: Pokemon, dtype: object

### <font color=blue> Extract Series Value by Index </font>

In [49]:
poke_idx.head()

Type
Grass     Bulbasaur
Grass       Ivysaur
Grass      Venusaur
Fire     Charmander
Fire     Charmeleon
Name: Pokemon, dtype: object

In [50]:
poke_idx["Fire"][0]

'Charmander'

In [51]:
poke_idx["Grass"].head(2)

Type
Grass    Bulbasaur
Grass      Ivysaur
Name: Pokemon, dtype: object

In [52]:
poke_idx[1]

'Ivysaur'

### <font color=blue> get method </font>

* Provides advantage wrt indexing, as we can provide fallback value ie. if value not present

In [53]:
pokemon.get(3)

'Charmander'

In [54]:
poke_idx.get("Grass")

Type
Grass     Bulbasaur
Grass       Ivysaur
Grass      Venusaur
Grass        Oddish
Grass         Gloom
            ...    
Grass       Chespin
Grass     Quilladin
Grass    Chesnaught
Grass        Skiddo
Grass        Gogoat
Name: Pokemon, Length: 66, dtype: object

In [55]:
poke_idx.get("Nor-World", default = "N/A")

'N/A'

In [56]:
# if any of the index not present in the series we will get default value

poke_idx.get(["Grass", "Nor-World"], default = "N/A")

'N/A'

### <font color=blue> Overwrite a Series Value </font>

* If overwrite index is not present, it will be created and append to the Series

In [57]:
sr = pd.read_csv("./datasets/pokemon.csv", usecols=["Pokemon", "Type"], index_col="Pokemon").squeeze("columns")
sr

Pokemon
Bulbasaur       Grass
Ivysaur         Grass
Venusaur        Grass
Charmander       Fire
Charmeleon       Fire
               ...   
Yveltal          Dark
Zygarde        Dragon
Diancie          Rock
Hoopa         Psychic
Volcanion        Fire
Name: Type, Length: 721, dtype: object

In [58]:
sr["Bulbasaur"] = "Water"
sr.head()

Pokemon
Bulbasaur     Water
Ivysaur       Grass
Venusaur      Grass
Charmander     Fire
Charmeleon     Fire
Name: Type, dtype: object

In [59]:
sr[[0, 2]] = "Shadow"
sr.head()

Pokemon
Bulbasaur     Shadow
Ivysaur        Grass
Venusaur      Shadow
Charmander      Fire
Charmeleon      Fire
Name: Type, dtype: object

In [60]:
sr["Gauravior"] = "Universe"
sr

Pokemon
Bulbasaur       Shadow
Ivysaur          Grass
Venusaur        Shadow
Charmander        Fire
Charmeleon        Fire
                ...   
Zygarde         Dragon
Diancie           Rock
Hoopa          Psychic
Volcanion         Fire
Gauravior     Universe
Name: Type, Length: 722, dtype: object

In [61]:
del(sr)