# Lecture 1: Introduction to Datahub and Jupyter
## 20 Feb 2021

### Table Of Contents
* [Introduction](#section1)
* [What will we learn?](#section2)
* [Homework and Submissions](#section3)


### Hosted by and maintained by the [Student Association for Applied Statistics (SAAS)](https://saas.berkeley.edu). 



![Juptyer Imagge](https://jupyter.org/assets/homepage.png)

<a id='section1'></a>
# Introduction
Hello! Welcome to Career Exploration Fall 2020!

This is just an introductory notebook for practice working with datahub and discussing the semester schedule. 

Datahub is a fantastic resource as it allows us to utilize python and common packages without needing to install a bunch of stuff and having that break.

Run the code chunk below by clicking on it and pressing `shift enter` (or `shift return`) on mac. These are all common packages we will import throughout the semesters.

In [None]:
import pandas as pd
import numpy as np
from sklearn.linear_model import LinearRegression
import matplotlib.pyplot as plt
%matplotlib inline

Ordinarily you would need to install python as well as all the packages and then sometimes stuff doesn't work. This usually causes many problems with differing versions and installation issues. Datahub bypasses all of these by providing a environment you can use online! 

### Steps to download from Slack and unzip on Datahub.         
1. Make sure you are in the slack workspace, navigate to the **career-exploration-spring2020** channel 
2. Download the LectureX.zip file
1. Open datahub at http://datahub.berkeley.edu/ and log in with your berkeley account
2. Click upload at the top right
3. Upload LectureX.zip (X represents the lecture number, for example Lecture1.zip)
4. Select 'new' at the top right of the datahub screen, and select terminal from the drop down
5. Enter "unzip LectureX.zip"
  * `unzip LectureX.zip`
6. Open the LectureX folder and open the ipynb file inside the LectureX folder


Our main source of file sharing will be uploading to slack. Remember to upload the entire zip file to Datahub and unzip. 

<a id='section2'></a>
# What will we learn?

This semester will go over many topics on a relatively high level. We begin with introducing jupyter notebooks (what you are reading from right now!) and use these to teach most of our lectures. Jupyter notebooks are incredibly useful as they allow you to run separate chunks of code at a time, without having to run the entire program at once.

We aim to go through the following topics for the semester.

<table class="table table-bordered table-hover table-condensed">
<thead><tr><th title="Field #1">Date</th>
<th title="Field #2">Lecture</th>
</tr></thead>
<tbody><tr>
<td>2/20</td>
<td>L1 Logistics and Datahub</td>
</tr>
<tr>
<td>2/27</td>
<td>L2 Python</td>
</tr>
<tr>
<td>3/6</td>
<td>L3 Numpy/Pandas + Visualizations</td>
</tr>
<tr>
<td>3/13</td>
<td>L4 Data Cleaning and Exploratory Data Analysis</td>
</tr>
<tr>
<td>3/20</td>
<td>L5 Intro to Linear Algebra and Linear Regression</td>
</tr>
<tr>
<td>4/3</td>
<td>L6 Intro to Machine Learning</td>
</tr>
<tr>
<td>4/10</td>
<td>L7 Bias Variance, Regularization</td>
</tr>
<tr>
<td>4/17</td>
<td>L8 Decision Trees, Random Forest, Boosting</td>
</tr>
<tr>
<td>4/24</td>
<td>L9 Neural Networks</td>
</tr>
<tr>
<td>5/1</td>
<td>L10 Advanced Topics</td>
</tr>
</tbody></table>

As you can see, the semester is packed full of various concepts, from statistical ideas such as bias and variance to machine learning concepts like neural networks and decision trees.

The semester is structured so that you will be able to accumulate foundational skills, learn more advanced concepts, and apply them to a final Kaggle competition. 

The course material is being written by our lovely Education committee! You will get to meet them over the course of the semester as we are rotating lecturers.

This schedule is quite ambitious and fast paced as it aims to cover a very large amount of material. 

**Please let us know if you ever have feedback, have questions, or you are just looking for some more help! We are all happy to help out. You can always reach us over slack.**

**This material is hard!**

We also hold many workshops and socials over the semester! We hope that you are all able to come participate and have a great time!

<a id='section3'></a>
# Project Checkpoint Submissions

This semester we are going to split up the Final Project into several checkpoints as opposed to having weekly homework assignments. This helps create a fun and low stress way of staying on top of the material! 

FINAL CHECKPOINT: The Friday before dead week (12/3)


<a id='section3'></a>
# Datahub Guide

Datahub will be the place where all your code will reside. In some ways this is your development environment! You should always be familiar with the environment you program in, and here are some exercises to help you get started!

**Double click** a cell to edit the contents. 

There are two main kinds of cellswe will be expecting you to know, Markdown and Code. 
You can run a cell by pressing **ctrl-enter**.
Code: Really self explanitory. This is where your code is stored.
Markdown: All the text stuff. There is also some latex integration! w $\alpha$ o! Just put $ around your markdown code.

The **kernal** is something you find might crash a lot in the future. If the kernal does not work, your code does not work. Look at the top right of the notebook to see the status of the kernal.

As the wise IT guys always say, try turning it off and on again if it doesn't work. To restart the kernal, try to find the "kernal" section near the top left of the notebook. This should be in the dropdown menu.

The **toolbar** at the top of the notebook also holds some pretty useful tools! 

<h3>Q1</h3>

Look in the toolbar an try to find the **Run All Above** command. 

What folder is it in? _________

<h3> Q2 </h3>

Delete the cell below

*DELETE ME PLEASE*

<h3> Q3 </h3>

**In the cell below, write a number 1 to 100 inclusive.**

*Make me a number*

<h3> Q4 </h3>

In the cells below, write your name, major, a fun fact about yourself, a short game, and a quick survey. Make sure to hit Save (File > Save and Checkpoint) or Ctrl/Command-S after you've finished writing. 

**Name**: 

**Major**: 

**Fun Fact**: 

<h3> Q5 </h3>

Run the cell below to make sure everything runs fine. 

In [None]:
func = lambda x: 4*x+2
samples = 100
data_range = [0, 1]


x = np.random.uniform(data_range[0], data_range[1], (samples, 1))
y = func(x) + np.random.normal(scale=3, size=(samples, 1))
model = LinearRegression().fit(x, y)
predictions = model.predict(np.array(data_range).reshape(-1, 1))


fig, ax = plt.subplots(figsize=(12, 8))
plt.scatter(x, y)
plt.plot(data_range, list(map(func, data_range)), label="Truth")
plt.plot(data_range, predictions, label="Prediction")
plt.xlabel("X")
plt.ylabel("Y")
plt.title("Linear regression")
plt.legend()
plt.show()