# pandas


In [1]:
import pandas as pd



Core components
- Series
- Data frames

### Series
- Essentially a column of data
- can be indexed with []

Create from a list/array:

In [2]:
# Create a Series from a list
series = pd.Series([25.8, 16.2, 17.9, 18.8, 23.6, 29.9, 23.6, 22.1])

series

0    25.8
1    16.2
2    17.9
3    18.8
4    23.6
5    29.9
6    23.6
7    22.1
dtype: float64

In [3]:
series2 = pd.Series([3, 5, 1, 2.4, '3.7', 3])
series2
# returns type "object" bc it's trying to coerce string '3.7' to float

0      3
1      5
2      1
3    2.4
4    3.7
5      3
dtype: object

### DataFrame
- multidimenionsal table made up of many series

Create from list of lists or dictionaries:

In [5]:
# List of lists
df = pd.DataFrame([[25.8, 28.1, 16.2, 11.0],[17.9, 14.2, 18.8, 28.0],
                   [23.6, 18.4, 29.9, 27.8],[23.6, 36.2, 22.1, 14.5]],
                 columns=['A','B','C','D'])
df
# without specifying columns, headers would just be integers like the rows

Unnamed: 0,A,B,C,D
0,25.8,28.1,16.2,11.0
1,17.9,14.2,18.8,28.0
2,23.6,18.4,29.9,27.8
3,23.6,36.2,22.1,14.5


In [6]:
# Dictionaries
df = pd.DataFrame({'A': [25.8, 17.9, 23.6, 23.6],
                   'B': [28.1, 14.2, 18.4, 36.2],
                   'C': [16.2, 18.8, 29.9, 22.1],
                   'D': [11.0, 28.0, 27.8, 14.5]})

df
# each key = column name, each value = column

Unnamed: 0,A,B,C,D
0,25.8,28.1,16.2,11.0
1,17.9,14.2,18.8,28.0
2,23.6,18.4,29.9,27.8
3,23.6,36.2,22.1,14.5


**creating from a list vs a dictionary are kinda reversed. if you're creating from lists, each list is a row, but from a dictionary, each list of values are columns

## Importing Data
- pd.read_csv() is most useful to read CSV files

In [None]:
# using BSRN data from data folder for practice
bsrn = pd.read_csv('../data/BSRN_GOB_2019-10.csv')

#### Exploring dataframes

| Method | Description |
| :----- | :---------- |
| <span style="font-family: Lucida Console, Courier, monospace; font-weight: bold"> df.info() </span> | Prints a concise summary of the DataFrame |
| <span style="font-family: Lucida Console, Courier, monospace; font-weight: bold"> df.head(<i>n</i>) </span> | Returns the first *n* rows of the DataFrame |
| <span style="font-family: Lucida Console, Courier, monospace; font-weight: bold"> df.tail(<i>n</i>) </span> | Returns the last *n* rows of the DataFrame |
| <span style="font-family: Lucida Console, Courier, monospace; font-weight: bold"> df.index </span> | Returns the index range (number of rows) |
| <span style="font-family: Lucida Console, Courier, monospace; font-weight: bold"> df.columns </span> | Returns the column names |
| <span style="font-family: Lucida Console, Courier, monospace; font-weight: bold"> df.dtypes </span> | Returns a Series with the data types of each column indexed by column name |
| <span style="font-family: Lucida Console, Courier, monospace; font-weight: bold"> df.size </span> | Returns the total number of values in the DataFrame as an `int` |
| <span style="font-family: Lucida Console, Courier, monospace; font-weight: bold"> df.shape </span> | Returns the shape of the DataFrame as a tuple (*rows*,*columns*) |
| <span style="font-family: Lucida Console, Courier, monospace; font-weight: bold"> df.values </span> | Returns the DataFrame values as a NumPy array (not recommended) |
| <span style="font-family: Lucida Console, Courier, monospace; font-weight: bold"> df.describe() </span> | Returns a DataFrame with summary statistics of each column |

#### Indexing
- df.iloc[] (index location)
    - can select multiples via slicing
    - df.iloc[row_start:row_end:row_step, col_start:col_end: col_step]
- df.loc[] row indexing
    - single row = series
    - stop value is inclusive
- df['column_name'] column indexing
    - column names are considered keys to the dataframe
    - use[[]] to call list of columns
        - [['column1', 'column2']]
        - otherwise get a KeyError
    - df['column_name'].unique() returns all the unique values in that column

#### Datetimes
- pd.to_datetime() converts data to datetime objects
    - if they're in the right format**