# Pandas

In [1]:
import numpy as np
import pandas as pd

### Pandas data tpyes

In [3]:
test = pd.Series([12.4, 11.4, 8.3, 7.1, 18.2])
test

0    12.4
1    11.4
2     8.3
3     7.1
4    18.2
dtype: float64

As shown the series automatically asigns a data type

With a series you can give the series a name to describe it

In [4]:
test.name = 'Example Data'
test

0    12.4
1    11.4
2     8.3
3     7.1
4    18.2
Name: Example Data, dtype: float64

In [5]:
test.values

array([12.4, 11.4,  8.3,  7.1, 18.2])

#### Same as a regular list individual elements can be selected

In [6]:
test[3]

7.1

#### The index of a series can however be changed like so

In [7]:
test.index = [
    'First',
    'Second',
    'Third',
    'Fourth',
    'Fifth'
]
test

First     12.4
Second    11.4
Third      8.3
Fourth     7.1
Fifth     18.2
Name: Example Data, dtype: float64

The series can be created with custom indexes and name from the start if prefered

In [9]:
test2 = pd.Series(
    [(255, 0, 0), (0, 255, 0), (0, 0, 255)],
    index = ['Red', 'Green', 'Blue'],
    name = 'RGB'
)
test2

Red      (255, 0, 0)
Green    (0, 255, 0)
Blue     (0, 0, 255)
Name: RGB, dtype: object

In [10]:
test2['Blue']

(0, 0, 255)

In [17]:
test2[['Green', 'Red']]

Green    (0, 255, 0)
Red      (255, 0, 0)
Name: RGB, dtype: object

#### In pandas the upper limit of a list is included vs python

In [21]:
l = [1, 2, 3, 4]
print(l[:2])
test2['Red':'Blue']

[1, 2]


Red      (255, 0, 0)
Green    (0, 255, 0)
Blue     (0, 0, 255)
Name: RGB, dtype: object

# Boolean series

Same as Numpy, operations can be used on the series:

In [22]:
test * 10

First     124.0
Second    114.0
Third      83.0
Fourth     71.0
Fifth     182.0
Name: Example Data, dtype: float64

In [24]:
(test * 10).mean()

114.8

In [25]:
test

First     12.4
Second    11.4
Third      8.3
Fourth     7.1
Fifth     18.2
Name: Example Data, dtype: float64

In [26]:
test > 12

First      True
Second    False
Third     False
Fourth    False
Fifth      True
Name: Example Data, dtype: bool

In [27]:
test[test > 12]

First    12.4
Fifth    18.2
Name: Example Data, dtype: float64

In [29]:
test[(test > 8) & (test < 12)]

Second    11.4
Third      8.3
Name: Example Data, dtype: float64

### Modifying Series

In [32]:
test['Third'] = 19.2
test

First     12.4
Second    11.4
Third     19.2
Fourth     7.1
Fifth     18.2
Name: Example Data, dtype: float64

In [33]:
test[test > 19] = 20
test

First     12.4
Second    11.4
Third     20.0
Fourth     7.1
Fifth     18.2
Name: Example Data, dtype: float64

<br>
<br>
<br>
<br>

# Data Frames

A data frame is more like a table

Data frames will most likely be pulled from as csv as manually creating one is more tedious

In [39]:
df = pd.DataFrame({
    'Rainfall': [15, 12, 11],
    'Temperature': [10, 13, 14],
    'Wind': [20, 5, 7],
}, 
columns = ['Rainfall', 'Temperature', 'Wind'],
index = ['England', 'France', 'Germany'])

df

Unnamed: 0,Rainfall,Temperature,Wind
England,15,10,20
France,12,13,5
Germany,11,14,7


In [40]:
df.info()

<class 'pandas.core.frame.DataFrame'>
Index: 3 entries, England to Germany
Data columns (total 3 columns):
 #   Column       Non-Null Count  Dtype
---  ------       --------------  -----
 0   Rainfall     3 non-null      int64
 1   Temperature  3 non-null      int64
 2   Wind         3 non-null      int64
dtypes: int64(3)
memory usage: 84.0+ bytes


In [43]:
df.size

9

In [44]:
df.shape

(3, 3)

In [45]:
df.describe()

Unnamed: 0,Rainfall,Temperature,Wind
count,3.0,3.0,3.0
mean,12.666667,12.333333,10.666667
std,2.081666,2.081666,8.144528
min,11.0,10.0,5.0
25%,11.5,11.5,6.0
50%,12.0,13.0,7.0
75%,13.5,13.5,13.5
max,15.0,14.0,20.0


In [46]:
df.dtypes

Rainfall       int64
Temperature    int64
Wind           int64
dtype: object

<br>

#### Indexing, Seleciton and Slicing

By Index

In [47]:
df.loc['England']

Rainfall       15
Temperature    10
Wind           20
Name: England, dtype: int64

By Position

In [48]:
df.iloc[1]

Rainfall       12
Temperature    13
Wind            5
Name: France, dtype: int64

By Specific

In [51]:
df['Temperature']

England    10
France     13
Germany    14
Name: Temperature, dtype: int64

All returned results are examples of series

Can use same rules as before for series in terms of all from England to France for example
<br>
You can also get more specific as shown below

In [53]:
df.loc['England': 'France', ['Rainfall']]

Unnamed: 0,Rainfall
England,15
France,12


<br>

## Boolean Arrays

In [54]:
df

Unnamed: 0,Rainfall,Temperature,Wind
England,15,10,20
France,12,13,5
Germany,11,14,7


In [57]:
df['Wind'] > 6

England     True
France     False
Germany     True
Name: Wind, dtype: bool

In [56]:
df[df['Wind'] > 6]

Unnamed: 0,Rainfall,Temperature,Wind
England,15,10,20
Germany,11,14,7


In [58]:
df.loc[df['Wind'] > 6, ['Wind']]

Unnamed: 0,Wind
England,20
Germany,7


<br>

## Dropping

In [59]:
df.drop(['Germany'])

Unnamed: 0,Rainfall,Temperature,Wind
England,15,10,20
France,12,13,5


Same as previously, this doesn't permanently modify the dataframe

<br>

#### Operations

In [61]:
df[['Wind']] * 1.6

Unnamed: 0,Wind
England,32.0
France,8.0
Germany,11.2


Example of converting wind in say mph to kmh

<br>
<br>

## Modifying the data frame
#### You can use a series to add a new column:

In [65]:
UV = pd.Series(
    [1, 3],
    index = ['England', 'Germany'])
df['UV'] = UV

In [66]:
df

Unnamed: 0,Rainfall,Temperature,Wind,UV
England,15,10,20,1.0
France,12,13,5,
Germany,11,14,7,3.0


Renaming colums / indexes

In [70]:
df.rename(
    columns = {
        'Temperature': 'Temp'
    })

Unnamed: 0,Rainfall,Temp,Wind,UV
England,15,10,20,1.0
France,12,13,5,
Germany,11,14,7,3.0


Making new columns using current columns

In [71]:
df['UV per Temp'] = df['UV'] / df['Temperature']

In [72]:
df

Unnamed: 0,Rainfall,Temperature,Wind,UV,UV per Temp
England,15,10,20,1.0,0.1
France,12,13,5,,
Germany,11,14,7,3.0,0.214286


In [74]:
df.describe()

Unnamed: 0,Rainfall,Temperature,Wind,UV,UV per Temp
count,3.0,3.0,3.0,2.0,2.0
mean,12.666667,12.333333,10.666667,2.0,0.157143
std,2.081666,2.081666,8.144528,1.414214,0.080812
min,11.0,10.0,5.0,1.0,0.1
25%,11.5,11.5,6.0,1.5,0.128571
50%,12.0,13.0,7.0,2.0,0.157143
75%,13.5,13.5,13.5,2.5,0.185714
max,15.0,14.0,20.0,3.0,0.214286


<br>
<br>