## Introduction to Pandas 1

_from pandas documentation_

pandas is a Python package providing fast, flexible, and expressive data structures designed to make working with “relational” or “labeled” data both easy and intuitive. It aims to be the fundamental high-level building block for doing practical, real world data analysis in Python. Additionally, it has the broader goal of becoming the most powerful and flexible open source data analysis / manipulation tool available in any language. It is already well on its way toward this goal.

Good introduction here: https://pandas.pydata.org/pandas-docs/stable/10min.html#

## Demo with long-term rain data

Import modules

In [1]:
import pandas as pd


Set some variables to make our lives easier

In [2]:
excelfile = 'Input_data/DCA_longterm_hourly.xlsx'

Import data into a Series or a DataFrame

In [3]:
df = pd.read_excel(excelfile,usecols=[4,9])

check out the data

In [4]:
df.head()

Unnamed: 0,Datetime,Precip_clean
0,1948-05-01 01:00:00,0.0
1,1948-05-01 16:00:00,0.01
2,1948-05-01 17:00:00,0.02
3,1948-05-02 15:00:00,0.02
4,1948-05-02 16:00:00,0.12


In [5]:
df.describe()

Unnamed: 0,Precip_clean
count,47862.0
mean,0.054382
std,0.109126
min,0.0
25%,0.01
50%,0.02
75%,0.06
max,3.29


In [6]:
df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 47862 entries, 0 to 47861
Data columns (total 2 columns):
Datetime        47862 non-null datetime64[ns]
Precip_clean    47862 non-null float64
dtypes: datetime64[ns](1), float64(1)
memory usage: 747.9 KB


Total rainfall for 1990

In [7]:
df1990 = df[(df['Datetime'] > '1990-01-01') & (df['Datetime'] < '1991-01-01')]
df1990['Precip_clean'].sum()

40.840000000000003

Export results to a new excel spreadsheet

In [8]:
df1990.to_excel('Output_data/df1990.xlsx')

## Try it yourself

In [12]:
df1990.loc[:,"test"]=df1990.loc[:,"Precip_clean"]*2.+35

A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy
  self.obj[item] = s


In [14]:
df1990["hello"] = df1990["test"] * 2.0

A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy
  """Entry point for launching an IPython kernel.


In [15]:
df1990

Unnamed: 0,Datetime,Precip_clean,test,hello
24181,1990-01-01 01:00:00,0.12,35.24,70.48
24182,1990-01-01 02:00:00,0.08,35.16,70.32
24183,1990-01-04 09:00:00,0.01,35.02,70.04
24184,1990-01-04 10:00:00,0.03,35.06,70.12
24185,1990-01-04 11:00:00,0.02,35.04,70.08
24186,1990-01-06 05:00:00,0.01,35.02,70.04
24187,1990-01-06 06:00:00,0.02,35.04,70.08
24188,1990-01-06 07:00:00,0.01,35.02,70.04
24189,1990-01-08 10:00:00,0.03,35.06,70.12
24190,1990-01-08 11:00:00,0.15,35.30,70.60
