# 30 minute introduction to Pandas
After following this introduction one should be able to get started with Pandas. The following sources have been used:
1. 10 minute introduction to pandas (http://pandas.pydata.org/pandas-docs/stable/10min.html)


The following topics are present in this introduction:

1. Basics and viewing:
    1. Importing data (CSV) and generating a dataframe
    2. Viewing (head and tail)
    3. Selection (rows, columns, slicing, label, position)
    4. Indexing (where, isin)
2. Basic working with data:
    1. Setting
    2. Information from data
    3. Plotting
3. Groupby

In [None]:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt

%matplotlib notebook

## 1.1 Importing data and generating a dataframe

In [None]:
# Importing data from a csv
delft_april = pd.read_csv('weather.csv')
delft_april

## 1.2 Viewing using head and tail

In [None]:
delft_april.head()

In [None]:
delft_april.tail(1)

## 1.3 Selecting: rows, columns, and slicing

In [None]:
# Using numpy slicing is possible, but using .loc is faster
# A single row is a series type
delft_april.loc[0:5]

In [None]:
# Selecting one or multiple columns
delft_april[['HIGH', 'LOW']]

In [None]:
# Selecting by position is done using .iloc
delft_april.iloc[0:5, [1, 5, 6]]


## 1.4 Indexing

In [None]:
# The numpy.where() function is available and can be combined easily
# Do not forget the brackets
delft_april[(delft_april['MEANTEMP'] > 10) & (delft_april['LOW'] < 8)]

In [None]:
delft_april.iloc[0, -1]

In [None]:
# The .isin() function
delft_april[delft_april['DOMWDIR'].isin(['NW', 'SW'])]

## 2.1 Setting

In [None]:
# Adding another data entry with append
data_twenty_six = [26,06.7,10.1,'15:09',02.9,'06:11',11.6,00.0,01.4,05.6,31.4,'17:24','NNW']
twenty_sixth = pd.Series(data_twenty_six, index=delft_april.columns)

In [None]:
delft_april.append(twenty_sixth, ignore_index=True)

In [None]:
delft_april.index

In [None]:
# Setting the index to a date range
delft_april.index = pd.date_range('20170401', periods=len(delft_april))

In [None]:
delft_april

In [None]:
# Remove some columns using .drop command
delft_april = delft_april.drop(['HEATDEGDAYS', 'COOLDEGDAYS', 'DAY'], axis=1)

In [None]:
delft_april.head()

In [None]:
# Setting the column labels
delft_april.columns = ['MEANTEMP', 'HIGH', 'TIME_H', 'LOW', 'TIME_L', 'RAIN',
       'AVGWINDSPEED', 'HIGH_W', 'TIME_W', 'DOMWDIR']

In [None]:
# Checking the new column labels
delft_april.columns

## 2.2 Information from data

In [None]:
# Mean values across the columns
delft_april.mean()

In [None]:
# Many options equal to numpy
delft_april['MEANTEMP'].max()

In [None]:
# The .describe() funcion
delft_april.describe()

## 2.3 Plotting

In [None]:
# Plotting, single and multiple lines
delft_april[['MEANTEMP', 'HIGH', 'LOW']].plot(title='Average temperature in Delft during May 2017')


In [None]:
delft_april.columns

In [None]:
# I wonder if the highest temperatures of the day occur later in the day
# Plotting the highest occurring temperatures as a function of the time
delft_april.sort_values(by='TIME_H').plot(x='TIME_H', y='HIGH')


In [None]:
delft_april[['MEANTEMP', 'HIGH']].plot.hist(alpha=.5)

In [None]:
delft_april[['MEANTEMP', 'HIGH', 'LOW']].plot.box()

## 3 Groupby
1. Splitting the data based on some criteria
2. Apply a funcion to each group independently
3. Combine the result in a datastructure

In [None]:
delft_april.head()

In [None]:
delft_april['DOMWDIR'].unique()

In [None]:
# Grouping by the wind direction
grouped_wd = delft_april.groupby('DOMWDIR')

In [None]:
len(grouped_wd)

In [None]:
# Viewing a specific group
grouped_wd.get_group('SW')

In [None]:
grouped_wd.describe()

In [None]:
# Grouping and applying a function
grouped_wd_mean = delft_april.groupby('DOMWDIR').mean()