<a href="https://colab.research.google.com/github/NIP-Data-Computation/show-and-tell/blob/master/piercel_week1_notes1.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

**Author**: Pierce Lopez <br>
**Date Created**: August 9, 2020 <br>
**Last Updated**: August 10, 2020 <br> 
**Description**: Contains my notes on the Data Analyst lesson: _Introduction to Data Science in Python_. 

# Introduction to Data Science in Python
## Chapter 1: Getting Started in Python
<br>

### Section 1: Dive into Python

**Modules** - groups related functions
* matplotlib
* pandas
* numpy

```
import module_name as alias
import module_name.submodule as alias
from module_name import submodule as alias
```

<br>

### Section 2: Creating variables

Basic data types:
* string
* float (real numbers)
* integer
* boolean (TRUE/FALSE)

Variable names...
* can contain letters, numbers, and underscores.
* must start with a letter.
* are case-sensitive

```
# assigning values
variable = value
string_variable = "string value"
StRiNg_vArIaBlE = "StRiNg VaLuE"

# displaying values
print(variable)
print("the next is a float and a variable", 1.23, variable) 
```

<br>

### Section 3: Fun with functions

Functions create an outputs from inputs depending on what the functions do.

```
# format
module_name.function(arguments)

# read csv files
pd.read_csv()

# make line plot
plt.plot(positional_arguments, keyword_arguments)
plt.plot(x, y, label = "Label")

# display made plot
plt.show()
```

## Chapter 2: Loading Data in pandas

<br>

### Section 1: What is pandas?

Some of pandas' capabilities:
* Loading tabular data into DataFrames **(another data type!)**
* Accessing particular parts of data (elements/ rows/ columns)
* Combining data

```
# do not forget to import!
import pandas as pd

# read a csv file (tabular data) into a DataFrame and storing the DataFrame
dataframe = pd.read_csv("filename")

# first rows in the DataFrame
dataframe.head()

# dataframe information (i.e. number of rows, column names (with their data types))
dataframe.info()
```

<br>

## Section 2: Selecting columns

```
# selecting a column (two methods)
first_method = dataframe.column_name         # dot notation
second_method = dataframe['column_name']     # bracket notation
```

**Note:** The dot notation only works for strings that contain <ins>letters</ins>, <ins>numbers</ins>, and <ins>underscores</ins>.

<br>

## Section 3: Selecting rows with logic

Recall the boolean data type (logic/truth values), which can be an output **(if reasonable)** when one does a comparison. Such comparisons can be done using:

1. '==' - equal to
2. '<' - less than
3. '>' - greater than
4. '<=' - less than or equal to
5. '>=' - greater than or equal to
6. '!=' - not equal

```
# filtering row values from a column that satisfy a condition
dataframe.column_name > value

# selecting row values using the logic filter done
dataframe[dataframe.column_name > value]
```

## Chapter 3: Plotting Data with matplotlib
### Section 1: Creating line plots

```
# do not forget to import!
import matplotlib.pyplot as plt

# line plot
plt.plot(first_positional argument, second_positional argument)

# displaying the line plot
plt.show()
```

**Note:** Multiple line plots can be shown by introducing several `plt.plot()` commands before `plt.show()`

<br>

### Section 2: Adding text to plots

```
# plot labels
plt.xlabel("x-axis label")
plt.ylabel("y-axis label")

# plot title
plt.title("Plot Title")

# plot legend
plt.plot(x1, y1, label = "First line plot")
plt.plot(x1, y1, label = "Second line plot")
plt.plot(x1, y1, label = "Third line plot")

plt.legend()

# place text in plot
plt.text(x_coordinate, y_coordinate, "Sample message.")
```

Some other useful arguments:
* _fontsize_: numeric value
* _color_ 

<br>

### Section 3: Styling graphs

```
plt.plot(x, y, color, linewidth, linestyle, marker)
```
* _color_: 'red', 'green', 'blue', etc. 
* _linewidth_: numeric value
* _linestyle_: line ('-'), dashed ('--'), dash-dotted ('-.'), dotted (':'), etc.
* _marker_: 'x', '*', square ('s'), holes ('o'), diamond ('d'), hexagon ('h'), etc.

We can also set styles for the plots!

```
# some sample styles
plt.style.use("ggplot")
plt.style.use("seaborn")
```

## Chapter 4: Different Types of Plots
### Section 1: Making a scatter plot

```
# scatter plot
plt.scatter(data1, data2, alpha)
```
The arguments from `plt.plot()` can also be used here. An additional argument which can be useful is _alpha_, which sets transparency.

<br>

### Section 2: Making a bar chart

Another useful argument is _yerr_, which adds error bars.
```
# vertical bar chart
plt.bar(data1, data2, yerr)

# horizontal bar chart
plt.barh(data1, data2)
```

We can also stack bar charts using the _bottom_ argument!
```
# bar chart
bottom_bar = plt.bar(data1, data_dog)

# stacked bar chart
top_bar = plt.barh(data1, data_cat, bottom = data_dog)

# This stacks data_dog on top of data_cat
```

<br>

### Section 3: Making a histogram

```
# histogram
plt.hist(data, bins, range, density = TRUE)
```
* _bins_: number of bins
* _range_: (low, high)
* _density = TRUE_: normalization