# Core Statistics Using Python
### Hana Choi, Simon Business School, University of Rochester


# Getting Started with Python
## Topics covered

- Introduction to Python 
- Jupyter notebooks
- Commenting in Python
- Basic Python functionality
- Using Python packages
- Loading data files (csv, excel)
 
## My limited Python goals for this course

- My plan is to give a MINIMAL introduction to Python so we can perform the statistical techniques introduced in this class.
- Note that I am not expecting you to learn to program in this course (or planning to teach you how to do so!).
- You should, however, be able to modify the simple code I provide to tackle other stats-related problems outside of this class.

* * *

# Welcome to Jupyter notebooks!
## Basic functions in a little diagram
<img src='https://raw.githubusercontent.com/michhar/python-jupyter-notebooks/master/general/nb_diagram.png' alt="Smiley face" align="center">

## Useful shortcuts
- A complete list is [here](https://www.cheatography.com/weidadeyue/cheat-sheets/jupyter-notebook/), but these are my favorites.  There is a *command* mode and *edit* mode much like the unix editor `vi/vim`.  `Esc` will take you into command mode.  `Enter` (when a cell is highlighted) will take you into edit mode. Try some shortcuts below.

Mode  |  What  | Shortcut
------------- | ------------- | -------------
Command (Press `Esc` to enter)  | Run cell | Shift-Enter
Command  | Add cell below | B
Command | Add cell above | A
Command | Delete a cell | d-d
Command | Go into edit mode | Enter
Edit (Press `Enter` to enable) | Run cell | Shift-Enter
Edit | Indent | Clrl-]
Edit | Unindent | Ctrl-[
Edit | Comment section | Ctrl-/
Edit | Function introspection | Shift-Tab

## Cells
- A block (like this) is called a cell.
- A cell can be a
    - A Markdoown/Text cell that contains formatted texts, images, etc.
    - Or, a Code cell that contains Python code and possibly console output.
- The current cell (block) is a Markdown/Text cell, which is white. 
- The cell right below is a Code cell, which is grey.

In [None]:
print('This line is Python code.')

# Commenting
- When writing code, you should leave detailed comments so that others, including yourself (now or later), can understand your code.
- We use `#` to start a comment, and Python will ignore anything to the right of it when executing the code.
- We can also make multi-line comments, by starting and ending with triple quotes (`'''` or `"""`).

In [None]:
# This is a comment. Try executing this cell block, and nothing will be run.

In [None]:
'''
This is a multi-line comment.
We can use this to write a section of things that are not intended to be run by Python. 
For example, we can write a paragraph that documents a function's purpose, inputs, and outputs.
'''

# Try executing this cell block. Only the below line will be executed. 
print('This line is Python code.') 

# Using Python as a basic calculator

In [None]:
# We can use Python as a calculator
5 + 6 

In [None]:
# Subtraction
10 - 4 # Note: you can leave a comment to the right of this code

In [None]:
# Multiplication and division
# Jupyter notebook only displays the last line's output in each cell

3 * 5 # This output will not be shown in the console.
10 / 2 # This last line's output will be shown.

In [None]:
# We can use 'print' function to display multiple outputs
print(3 * 5)
print(10 / 2)

In [None]:
# Exponentiation, or power, which we write as 4^2 in some other languages
print(4 ** 2)

In [None]:
# We can also define constants/values that we want to store in memory
seconds_in_a_day = 24 * 60 * 60
seconds_in_a_day

In [None]:
# We can then manipulate the object further if we want
seconds_in_a_week = 7 * seconds_in_a_day
seconds_in_a_week

# Using Packages in Python
## What are packages? 
- Python uses packages to deliver most of its functionality
- A package is a directory that contains multiple modules (Python `.py` files). 
- These modules specify functions, methods, types, etc. to perform specific types of tasks. 
- There are lots of packages available. Some of the packages we will be using in this course include: 
    - numpy: data manipulation
    - pandas: database
    - matplotlib: data visualization
    - statsmodels: stastical functionality
    - scipy: another stats package
    - sklearn: a machine learning package

## How do you install a package?

- Students can follow the steps outlined in a separate document available on our course site, which includes instructions for installing packages within a Conda virtual environment. <br>
<br>

- The following link provides a general user guide for managing packages:

    [https://conda.io/docs/using/pkgs.html](https://conda.io/docs/using/pkgs.html)
   

## Importing packages

- Once packages are installed, a package needs to be "called" or "imported" in every python session (e.g., whenever you launch a new jupyter notebook session). <br>
<br>

- You can
 
    - Import a package (numpy) with it's own name
    
      `import numpy` <br>
      <br>
    - Import a package (numpy) with a different name (np) to make it simpler to write
    
      `import numpy as np` <br>
      <br>
        
    - Import a particular module (array) or a function from a package (numpy)
   
      `from numpy import array`

## Here are the packages/modules we need for this notebook

In [None]:
# Importing pandas package with a name "pd"
import pandas as pd  

# Importing os package 
# This is a built-in module (so you don't have to install it separately) with methods for interacting with the operating system.
import os

# Loading Data


- Remember, you need to run the code block above to import the packages we need, every time you launch a new Python session or a Jupyter notebook session.<br>
<br>

- If you are getting errors somewhere below, it is likely because you did not run the code cell above!

## Paths

- Paths are important for accessing files, like datasets.
- In Python, there are two main paths to keep track of.
    1. The path where your Python is running (called "working directory").
    2. The path where your data file is saved (called "file path").
- If you don't explicitly tell Python where your data file exists (file path), Python will assume that your data file exists under your "working directory".

In [None]:
# Getting your current working directory
cwd = os.getcwd() 
cwd

## Loading CSV files

### Method 1: Save the data file directly to your working directory
- Then even if you don't explicitly tell Python where your dataset exists, Python will assume that it's in your current working directory (and that's where we saved the data file!)

In [None]:
# Let's try loading a csv file "HoldTimes.csv"

hold_times = pd.read_csv('HoldTimes.csv')
hold_times

### Method 2: Tell Python where your data file exists "explicitly"
- hold_times = pd.read_csv("\<file_path>/HoldTimes.csv ") 
- \<file_path> is where you saved the data, e.g."~/Dropbox/Data/" (Mac) or "C:/Users/<username>/Dropbox/Data" (Windows).

In [None]:
# Let's try specifying the file path "explicitly"
# Below is "my" file path, you should specify yours instead.

hold_times2 = pd.read_csv("/Users/hanachoi/Dropbox/teaching/core_statistics/PythonCode/HoldTimes.csv")
hold_times2

## Loading Excel files

- What if we want to load an excel file instead? Converting an excel file to a csv file is cumbersome.
- Fortunately, the pandas library also includes "read_excel" module, which allows you to read Excel files into a DataFrame (just like it did with csv files!)
- Our task is then to find and utilize these helpful libraries and modules that are available out there. 
- The simplest way to find them is by searching on Google or using AI tools, such as ChatGPT. Just ask, "How to read an Excel file in Python?"

<img src="pic_chatgpt.png" width="500">


In [None]:
# Let's try loading an Excel file named "excel_sample_data.xlsx"

led_bulb = pd.read_excel('excel_sample_data.xlsx', sheet_name="LEDBulb")
print(led_bulb.head) # Displays the first few rows of the DataFrame