# Notes from chapter readings

[series and data frames](https://meds-eds-220.github.io/MEDS-eds-220-course/book/chapters/lesson-2-series-dataframes.html)

- pandas is built ontop of numpy and exists to wrangle and analyze tabular data

- conventional to import as pd [conventions](- conventional to import as pd ([conventions]())
)

## Series: 1-D array of indexed data
- indexing differentiates between pandas.Series and a NumPy array 
- arrays are still indexable but this is not built into the data structure

creating a series
`pd.Series(data, index=index`
where 'data' can be a list or NumPy array, a dictionary, or a single number, boolean argument, or a string

example from a list: `pd.Series(['EDS 220', 'EDS 222', 'EDS 223'])`

example from a dictionary: create a dictionary using `{'key_0':2, 'key_1':3, 'key_2':4`, then create a series with `pd.Series(d)`

- you can create a series from a single value, which will be repeatedto match the length of the index
ex: `pd.Series(3.0, index = ['A', 'B', 'C'])` 

- most NumPy options will work on a series, and you can pass boolean arguments on them as well 

### Identifying missing values
- a missing, NULL, or NA value can be represented with the float value `numpy.nan`, if you were inputting these in your series it would look like: `s = pd.Series([1, 2, np.nan, 4, np.nan])` 

- you can check for missing values with `s.hasnans`- which returns a boolean answer

- alternatively there is `s.isna()`, which indexes through the series and returns a boolean answer for each value

In [1]:
## Check in
# create a pandas.series named s with four integer values, two of which are -999, indexing by A-D
import pandas as pd
import numpy as np
s = pd.Series([1, 2, -999, -999], index=['A', 'B', 'C', 'D'])
s

A      1
B      2
C   -999
D   -999
dtype: int64

In [2]:
#use `mask()` to update the series so that -999 values are replaced by NAs
# Mask -999 in s with default with NaN value
s = s.mask(s == -999) # reassign objects rather than inplace=True unless there's a massive amount of memory useage 
s

A    1.0
B    2.0
C    NaN
D    NaN
dtype: float64

## Data Frames
- pandas.DataFrame is the most used pandas object because it indexes through multiple pandas.Series (as columns) using the same index

initializing a dataframe
- can be made from a dictionary with the column's data
`d = {'col_name_1' : pd.Series(np.arange(3)),
     'col_name_2' : pd.Series([3.1, 3.2, 3.3]),
     }`
     create a dataframe `df = pd.DataFrame(d)`
- change the index with `df.index = ['A', 'B', 'C']`


In [3]:
## Check in
# Initialize dictionary with columns' data 
d = {'col_name_1' : pd.Series(np.arange(3)),
     'col_name_2' : pd.Series([3.1, 3.2, 3.3])
     }

In [4]:
# Create data frame called df
df = pd.DataFrame(d)
df

Unnamed: 0,col_name_1,col_name_2
0,0,3.1
1,1,3.2
2,2,3.3


In [5]:
# Change index
df.index = ['a','b','c']
df

Unnamed: 0,col_name_1,col_name_2
a,0,3.1
b,1,3.2
c,2,3.3


In [7]:
# Update column names to C1 and C2 
df.columns = (['C1', 'C2'])
df.columns
df

Unnamed: 0,C1,C2
a,0,3.1
b,1,3.2
c,2,3.3


## Commenting

[commenting](https://meds-eds-220.github.io/MEDS-eds-220-course/book/appendices/comments-guidelines.html)

- comments start with a # followed by a single space, then text
- comments should start with a capital letter
- be consistent with punctuation

comments should represent the current code associated with it
- keep them short and clear, and if you need a longer explanation use markdown cells
- keep it professional

### Special types of comments

In-line comments: on the same line as the code, keep them short and use sparingly

Block comments: multiple comments in seperate lines- keep spacing consistent, align them

- explain why not what as you progress
- redundancy clutters code
- don't use code to try and make overcomplicated code readble- just simplify your code

# Overall Takeaways from week 0 readings

pandas
- pandas series and dataframes are very useful tools that can help index through data
- most NumPy operations are applicable to pandas objects
- there are often multiple ways to achieve a result with a pandas or NumPy function

commenting
- keep comments consistent in presentation, consise, and up to date
- if you need to explain a lot of things about your code consider if it is as efficient as it could be, or consider adding a markdown cell
- your commenting will become more streamlined with coding practice
