In this short exercise, you will be introduced to installing and loading Python packages.
You will also import a dataset, estimate a regression model, and create a graphic.
This tutorial is intended to follow along with the R version. Try to follow along, but don’t worry
if you don’t understand everything right away. We will learn more about these steps later in the course.
If you can get the code below to run, you are good to go for the first day of class!

###Installing and Loading Packages


Once you have installed Python, you can install packages using the pip command in the terminal. To do so, enter the following command:

In [None]:
pip install scikit-learn pandas matplotlib seaborn statsmodels numpy

Next, try importing some of these libraries. You can run the code by highlighting the lines and selecting "Run" (or Shift + Enter in Jupyter). You can also click on the play button next to the code cell.

In [3]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
import statsmodels.api as sm
import sklearn  # General import of scikit-learn


If you don't receive any errors, you have successfully installed and loaded the necessary packages.

###Importing Data

Import the sample Excel file: ansc_quart1.xlsx. First, save the file from the course website to your machine. Next, use the pd.read_excel function to import the file into your Python environment, making sure to reference the file path on your machine.

Pro Tip:
Make sure to use / rather than \ when specifying a file path. Alternatively, in Jupyter Notebook, you can navigate to the file location, right-click the file, and select "Copy Path" to get the correct path.

In [None]:
ansc_quart1 = pd.read_excel("FILEPATH/ansc_quart1.xlsx")

print(ansc_quart1)

Sample Dataset:
| x  | y    |
|----|------|
| 10 | 8.04 |
| 8  | 6.95 |
| 13 | 7.58 |
| 9  | 8.81 |
| 11 | 8.33 |
| 14 | 9.96 |
| 6  | 7.24 |
| 4  | 4.26 |
| 12 | 10.84|
| 7  | 4.82 |
| 5  | 5.68 |


###Regression Analysis

Take a quick look at the dataset above. Let’s use a statistical technique called regression to predict y using the values in x. In regression terminology, we call this “a regression of y on x.”

In [None]:
# Define independent and dependent variables
X = df[['x']]
y = df['y']

# Add a constant (intercept term) for statsmodels
X = sm.add_constant(X)

# Fit the regression model
model = sm.OLS(y, X).fit()

# Print the summary
print(model.summary())



Output predictions and residuals for each observation:

| y    | x  | .fitted  | .resid    | .hat     | .sigma   | .cooksd  | .std.resid |
|------|----|----------|-----------|----------|----------|----------|------------|
| 8.04 | 10 | 8.001000 | 0.0390000 | 0.1000000 | 1.311535 | 0.0000614 | 0.0332440  |
| 6.95 | 8  | 7.000818 | -0.0508182 | 0.1000000 | 1.311479 | 0.0001042 | -0.0433179 |
| 7.58 | 13 | 9.501273 | -1.9212727 | 0.2363636 | 1.056460 | 0.4892093 | -1.7779327 |
| 8.81 | 9  | 7.500909 | 1.3090909 | 0.0909091 | 1.218483 | 0.0616370 | 1.1102882  |
| 8.33 | 11 | 8.501091 | -0.1710909 | 0.1272727 | 1.310017 | 0.0015993 | -0.1481007 |
| 9.96 | 14 | 10.001364 | -0.0413636 | 0.3181818 | 1.311496 | 0.0003829 | -0.0405092 |
| 7.24 | 6  | 6.000636 | 1.2393636 | 0.1727273 | 1.219936 | 0.1267565 | 1.1019046  |
| 4.26 | 4  | 5.000455 | -0.7404545 | 0.3181818 | 1.272721 | 0.1226999 | -0.7251598 |
| 10.84 | 12 | 9.001182 | 1.8388182 | 0.1727273 | 1.099742 | 0.2790296 | 1.6348730  |
| 4.82 | 7  | 6.500727 | -1.6807273 | 0.1272727 | 1.147055 | 0.1543412 | -1.4548813 |
| 5.68 | 5  | 5.500546 | 0.1794545 | 0.2363636 | 1.309605 | 0.0042680 | 0.1660660  |





###Visualizing Regression with a Scatterplot

To better understand regression, we can visualize the data using a scatter plot with a regression line.

In [None]:
sns.lmplot(x="x", y="y", data=ansc_quart1, ci=False)
plt.xlabel("X")
plt.ylabel("Y")
plt.title("Regression of Y on X")
plt.show()


`geom_smooth()` using formula = 'y ~ x'


!!! NEED TO ADD IMAGE OF GRAPH HERE !!!

Bam! Congrats on your first Python code!
You just cleared your first hurdle to getting into the world of Python!
