# Intro to Python / Jupyter Notebooks / Google Colab

**Fall 2024 - Instructor:  Chris Volinsky**

**Teaching Assistants: Aditya Deshpande, Stuti Mishra**


## Python

Python has become one of the most frequently used languages in the world of data science due to the ease of applying it to a large number of data science problems. Stakeholders in companies in different industries and of various sizes consistently choose Python as the most flexible and powerful option. If you are going to learn one language, Python is a great choice.

Additionally, Python benefits from a modular form which allows access to many libraries that enhance the core functionality of the language.   We will be using some of these core libraries, such as:

* `numpy`
* `pandas`
* `matplotlib`
* `seaborn`

and many others.


For this class we assume that you are coming in with basic programming skills and knowledge of core Python, including concepts of variable types, creating functions, conditionals, and looping.  

Please review the notebook [PythonBasics.ipynb](https://colab.research.google.com/drive/1yDstZniuQK_lDTZ1TAUF9oIxxwC4PJv1?usp=drive_link) to review some of these core programming concepts  - you should be comfortable with all of this material at the start of class.  We will not be teaching these concepts, so if you are not familiar with the content of that notebook, you should consider taking an Intro to Programming class as a prerequisite.

If you need to upskill on Python, there are many good YouTube videos and tutorials available on the web.


## Jupyter Notebooks

The main way data scientists use Python is through _Jupyter Notebooks_ (formerly called iPython notebooks).  It is an interactive computational environment, in which you can combine code execution, rich text, mathematics, plots and rich media.   Notebooks are great because the code lives side by side with commentary, descriptions, and visualizations, and allows the data scientist to create a document for others to walk through, and execute, the code.

The notebook form is a great way to code.  It separates code into *cells* where each cell can contain a functional chunk of code, and can be executed separately.  Notebook cells allow you to create and share documents that contain live code, equations, visualizations.  Documentation lives side by side with the code in text cells using `markdown`, a special syntax to make communication easy.  

Jupyter Notebooks have support for over 40 programming languages, including those popular in Data Science such as Python and R.  We will be using Python exclusively for our examples in this class, but feel free to program in other languages if you know them.  

## Google Colab and IDEs

Programmers create and interact with Python notebooks through an _Interactive Development Environment_ or **IDE**.     In this class, we will be using **Google Colab**, a cloud-based IDE that has many good features.   The best feature of Colab for the purpose of the class is that NYU provides the environment, compute and storage for Colab, so you dont have to provide this yourself. Also, it is an advantage to have everyone working in the same enviroment.   

<br>
<br>

![Colab_screen.png](https://drive.google.com/uc?export=view&id=1a9pYgIdlT1QrzQF1m6wvfP_0GvO67H7D)

<br>
<br>

Google Colab has some nice features, and it is usable via any browser, which is nice because that means no matter where you access it from, you have access to your files, and a consistent enviroment.  You can also share your notebooks easily - with fellow students or with me - via Google Drive.

NYU and Stern IT provide support for Colab, so if you have any issues with the platform, please reach out to them.  


### Integration with Drive

Colaboratory is integrated with Google Drive. It allows you to share, comment, and collaborate on the same document with multiple people:

* The **SHARE** button (top-right of the toolbar) allows you to share the notebook and control permissions set on it.

* **File->Save a Copy in Drive** creates a copy of the notebook in *your* Google Drive.

* **File->Revision history** shows the notebook's revision history - in case you accidentally delete something or want to revert to a previous working version.

**IMPORTANT** : Save every class notebook and module IN YOUR OWN GOOGLE DRIVE as soon as you open it.  That way you have your own copy to work on, edit and share.




### Other IDEs

Some of you with previous coding experience might be familiar with other IDEs, that run locally on your laptop or other environment.  There are many common IDEs available to code, including [PyCharm](https://www.jetbrains.com/pycharm/),  [Visual Studio Code](https://code.visualstudio.com/) _Jupyter Notebook_, _Anaconda_, and _Spyder_.  These have mostly the same functionality but have different looks and feels.    The main difference is these run _locally_, on your machine, and not in the cloud, so you are responsible for the compute and storage.   If you have a large dataset, or require a lot of compute, running locally **might** be faster and more convenient in some cases than running on the cloud.    

Students are welcome to use any IDE they like, as long as they can output HWs and project code as a .ipynb file.  



## Navigating a Python notebook (in Colab or elsewhere)

Jupyter notebooks are made up of **cells.**  There are two basics types of entries in a notebook: **text cells** for documentation, description and comments, and **code cells** that contain Python code.  Each time you add a cell (hover over the bottom of a cell in Colab) you can choose to add a text or code cell.






### Text cells and markdown

This is a text cell!!

You create a text cell by clicking the `(+Text)` button at the end of any existing cell or using the menu item `Insert=>Text Cell`


Now, you can write and do text formatting:

- Hashtags (number sign \#) are useful for titles.  Start a cell with hashtag text and Colab will  create a table of contents on the left menu bar. Single hashtags are main sections, and double, triple, or quad hashtags (####) are sub-sections.
- Surround words with \*asterisks\* or \_underscores\_ to italicize things: _underscore example_, *asterisk_example*.
- Backslash (\\) to get those special characters not to act special (see code in the preceeding item).
- Double **asterisks** (\**)  make things bold.
- Square Brackets [ ] are for links and images.
- Links can be created using square brackets including a label with the URL in parenthesis, such as [Link text](URL Here).    Example: [My Home Page](http://chrisvolinsky.com)
- You can create bullet lists by starting a line with a dash \- or an asterisk \* (or by using the formatting tools at the top of the text cell).  (Just like this list!)
- You can format code in a text cell using backticks \` : such as `this is a line of code`
- Type a colon \: to get a menu of emojis ❗😀 🌮 🧲 👸 💻
- Also, HTML code is allowed. Some resources can be found in [HTML w3schools](http://www.w3schools.com/html/html_examples.asp) <p style="color:red;">This is text formatted with HTML.</p>

- And you can write math with $\LaTeX$  (LaTeX is a typesetting language for the production of scientific documents https://www.latex-project.org/): You use LaTeX in Jupyter notebook markdown cells by wrapping the latex code in dollar signs, $x = \frac{-b \pm \sqrt{b^2 - 4ac}}{2a}$. If you don't know how to write a symbol, you can go to [Detexify](http://detexify.kirelabs.org/classify.html).


Some great Markdown references:
- [Markdown Cheatsheet](https://www.markdownguide.org/cheat-sheet/)
- [Guide to Markdown](/notebooks/markdown_guide.ipynb)

### Sessions

When you run code, the code is actually executed in a _session_. You can do bad things in a session: you can make it stuck in an endless loop, crash it, corrupt it, etc. And you probably will do all of these things :).

So sometimes you might have to interrupt your session or restart it. Use the "Runtime" menu to interrupt or restart the session, re-run your notebook, etc.

Also, before submitting a homework or a project, make sure to Restart and Run All. This will create a clean run of your project, without any side effects that you might encounter during development. We want you to submit the homeworks with output, and by doing that you will make sure that we actually can also execute your code properly.



### Code cells

In a code cell, we can type any Python commands, and run them by clicking the **Play icon**  ▶ or use the keyboard shortcut "Command/Ctrl+Enter".

also:
* Type **Shift+Enter** to run the cell and move focus to the next cell (adding one if none exists); or
* Type **Alt/Option+Enter** to run the cell and insert a new code cell immediately below it.

Importantly, some of the results of running a cell are "remembered" for subsequent cells, as long as the session continues running.  If you change a variable in an earlier cell, it will be remembered for the cell you are running.   

For example, in the following cell, it will remember that the **_VARIABLE_** called "x" is  the sum of two given numbers.  Once you run that cell, variable "x" will be available for use in cells below. Please select it and press the ▶ button to run it.  Then select the following cell and run it.

In [None]:
x = 5 + 5
print ("The value of the 'x' variable is " + str(x) + ".")

The value of the 'x' variable is 10.


In [None]:
print("The value of the 'x' variable is *still* " + str(x) + " down here.")

The value of the 'x' variable is *still* 10 down here.


Python notebooks flow from **top to bottom**. This means that if a cell relies on a variable or function that was created earlier in the notebook, you must run the prior cell to make that information available in future cells  (_we cannot just call "x" in other cell if we don't run this one before_)!

Sometimes you might need to re-run the entire notebook to re-initiate things.   The Colab "Runtime" menu allows you to run the entire notebook, or run everything up to the current cell, or everything after the current cell.

### Saving files and scripts

Python notebooks can be downloaded and saved (which you will need to do to submit your homeworks) as an .ipynb file, which is what you will need to submit for your homeworks.

This is easily done from the Colab menu:

`File => Download => Download .ipynb`

In addition a notebook can also be downloaded as a .py file, which allows it to be used as a standalone _script_ later.

`File => Download => Download .py`

A python _script_ is a (often small) program saved as a .py file that can be run as a standalone command. Scripts often cobble other commands into something useful that can be run again and again.

The ".py" extension on the file identifies the file as a script that can be run by itself.   

## Using GenAI

Generative AI is starting to get incorporated into many different coding tools.  In Colab, you might see the prompt in a new Code Cell that says:

 `[ ] Start coding or generate with AI`

 Clicking on the word <u> generate </u> will open a window where you can ask the AI to generate code for a particular task.  Try it!  Add a code cell below and give it some commands like:

 - *Draw 100 numbers from a standard normal distribution and plot a histogram*
 - *Create an X and a Y variable with 200 points each and a correlation of 0.8 then plot a scatterplot with a regression line.*


 Colab will create the code cell which you can run for the output you need!

 You can subsequently add amendments and Colab will remember what it did:
 - *plot the points from the previous plot in green with a black boundary and add axes that say "X" and "Y"*

You can also use Colab to find data and plot it (this is limited to certain data sets that are easily retrievable).

 - *Download monthly data on US inflation and plot a time series of the values*


 Colab also has just started to incorporate Google's Gemini GenAI tool into Colab (look at the upper right of the window).   You can interact and chat with Gemini in the same way you would with ChatGPT. This functionality is brand new and continually changing. We will be exploring how well this works throughout the year.



