## Introduction to Pandas
## Date: 25/1/22

Pandas has two main data structures, namely series and dataframes
### Understanding Pandas Series

In [1]:
import numpy as np
import pandas as pd

In [4]:
#series data structure
age = pd.Series([20, 23, 43, 21, 27, 25, 26])
age

0    20
1    23
2    43
3    21
4    27
5    25
6    26
dtype: int64

In [35]:
# you can perform boolean (mask) operations just like in an array or list. 
age < 25
age[age<25]
age[age < age.mean()]

chris        20
sam          23
zelda        21
ghosling     25
hemsworth    26
dtype: int64

In [20]:
#Exploring the pandas series object
print(type(age))
print(age.dtype)
print(age.values)
print(age.mean())

<class 'pandas.core.series.Series'>
int64
[20 23 43 21 27 25 26]
26.428571428571427


### Difference between numpy array and pandas series:
- The essential difference is the presence of the index: while the Numpy Array has an implicitly defined integer index used to access the values, the Pandas Series has an explicitly defined index associated with the values.
- Series is different from lists in the fact that all elements of the series will be of the same type. However, in a list this is not the case.

In [14]:
# numpy array
a = np.array(range(0, 5))
print('This is a numpy array ',a)
# pandas series
b = pd.Series(range(0, 5))
print('This is a Pandas series\n', b)

This is a numpy array  [0 1 2 3 4]
This is a Pandas series
 0    0
1    1
2    2
3    3
4    4
dtype: int64


With pandas series we can change the index and assign it names, for instance:

In [26]:
age.index = ['chris','sam', 'leah', 'zelda', 'ryan', 'ghosling', 'hemsworth']
print(age)

chris        20
sam          23
leah         43
zelda        21
ryan         27
ghosling     25
hemsworth    26
dtype: int64


In [31]:
#indexing
age['chris']
#multiindex
age[['chris', 'zelda']]
#using iloc
age.iloc[0:3]

chris    20
sam      23
leah     43
dtype: int64

In [33]:
#in pandas series, the upper limit during indexing is always included. Note that
# this is not the case with python lists or numpy arrays
age['chris': 'ryan']
#here ryan is also included 

chris    20
sam      23
leah     43
zelda    21
ryan     27
dtype: int64

## Some exercise
## Date: 25/1/22

In [36]:
# Order (sort) the given pandas Series
X = pd.Series([4,2,5,1,3],
              index=['forth','second','fifth','first','third'])
X = X.sort_values()
print(X)

first     1
second    2
third     3
forth     4
fifth     5
dtype: int64
