<a href="https://colab.research.google.com/github/EvanWAppel/work-examples/blob/main/Learn_Python_Scripting.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# How to start scripting Python in an hour!

Hey, welcome to my seminar. 

This is a Colaboratory notebook. Code notebooks have been around for a long time
but Colaboratory is the free version that Google offers as part of the G-Suite.

Python notebooks are designed to give you amazing storytelling powers. 

You can write rudimentary applications.

And if your really want to ruin your life, you can learn how to use a notebook to design artificial intelligence and run machine learning algorithms.

But, it's also a very effective way to get your own thoughts straight.

First, while I'm desperately trying to wrangle the class into order, please go up to File and select "Save a Copy in Drive". This will separate out your notebook from everyone else'. 

Also, feel free to use the [Code-Only notebook](https://colab.research.google.com/drive/1UnQJrFfhZJ6KkqcRigkJ6aPpUEPTqJeZ#scrollTo=htwInhEkK6G6), just be sure to branch off from the original!

Now, you have your own functional notebook. Rename it something to commemorate your entry into the wide world of computer programming!

You're going to learn five things today:


*   What is a coding environment?
*   How do I get data into this environment?
*   When I have my data in the environment, what can I do with it?
*   Can I make the data look pretty?
*   How do I get data out of the environment?



![image.png](https://imgs.xkcd.com/comics/python.png)

In [None]:
from google.colab import drive
drive.mount('/content/drive')

Mounted at /content/drive


If you choose to go out into the world and contribute the the GREATER CODEBASE, then these three qualities will do you a good turn:


*   A nearly pedantic sense of semantics
*   Stubborn, intractable curiosity
*   And the moral elasticity to plagiarize pretty much anything that you set your twinkling eyes upon.


# Importation Step

## Libraries

Pandas is a library that helps you make table-like data structures, which are conceptually similar to spreadsheets and make for easy translation.
[Documentation for Pandas](https://pandas.pydata.org/docs/getting_started/comparison/index.html)

"from google.colab import files" is a special library reference that helps colaboratory read files.

Plotly is a library for making visualizations and charts right in the notebook. [Plotly Documentation](https://plotly.com/python/)

Dates are so byzantine that you basically need a whole library to deal with their endless petty details. Hence, we need "datetime."

NUMPY is helpful for dealing with numbers in python notebooks. Get in the habit of adding it to your list of imports just in case. Trust me, you will eventually need it.

## Options

Right out of the box the notebook and libraries will have their own way of doing things, but if you find something inconvenient, look up the options and see if you might change something.

[Pandas options](https://pandas.pydata.org/docs/user_guide/options.html)

For example: vanilla pandas produces tables that are truncated after eight columns. We deal with column counts that often exceed a couple dozen, so I set the max column display to "None" meaning that there is no limit.

Similarly, I set the display width to be longer because I often use the whole page, not just a half-window.

Take it or leave it, it's optional!

In [None]:
# CELL 1
import pandas as pd
from google.colab import files 
import plotly.express as px
import datetime as datetime
import numpy as np
pd.set_option('display.max_columns', None)
pd.set_option('display.width', 800)

# Ingestion Step

Okay, so we want to take a CSV and turn it into something we can manipulate. That CSV needs to become an OBJECT called a DATAFRAME. We're going to want to give that dataframe a name by assigning it to a VARIABLE. 

Remember how we were all taught the "=" means equals? Well, not anymore. In Python "=" means you're assigning a variable. "==" means equals. If this is upsetting to you, then you understand why coders have that twitchy quality.

In [None]:
# CELL 2
customers = pd.read_csv("customers-learningset.csv")

We create a variable called customers then assign something to it, in this case a function that READs the data into a DATAFRAME!

Now, let's see what we've done!

In [None]:
# CELL 3
print(customers.head())

   Unnamed: 0       Customer ID Phone Type Phone First Name               Date Created ID Type
0           0  7103422184752028     MOBILE           ELAYNA  2 Jan 2022 00:03:57 +0000     NaN
1           1  7103422184752480     MOBILE        Tzanninis  2 Jan 2022 00:04:33 +0000     NaN
2           2  7103422184753244     MOBILE          STAPLES  2 Jan 2022 00:06:10 +0000     NaN
3           3  7103422184754068     MOBILE                A  2 Jan 2022 00:07:08 +0000     NaN
4           4  7103422184755565     MOBILE              NaN  2 Jan 2022 00:10:17 +0000     NaN


Wow! that was easy! Only two lines and we've got a data table ready to fuss with for hours!

Also, just for funsies, put a number inbetween the parentheses after the head() method! Then see what happens when you run it again.

# Transformation Step

Now, let's start manipulating our table. This is the core of data science. 

Let's say that we want to make another table that's just the daily count of new customers per day.



## Data Types

First step is to find the column that indicates a new customer. This happens to be the "Date Created" column. The problem with this column is that its DATATYPE is TIMESTAMP, which is not very helpful for AGGREGATION. We want just a straight date.

NB: The difference between METHODS and FUNCTIONS are that methods are applied to OBJECTS, but OBJECTS are ARGUMENTS for FUNCTIONS. (This will become more clear later!)

In [None]:
# CELL 4
customers = customers.assign(date = pd.to_datetime(customers['Date Created']).dt.date)

Start out with the dataframe variable and assign a new column using the assign method.

The assign method takes one ARGUMENT that's basically the name of the new column, an equals sign and what should go into the column. Here, we're going to use a function to turn the Date Created column into a DATETIME DATATYPE and then applying a method to turn it into just a DATE.

Did I warn you about dates or did I warn you?

## Aggregation

Aggregation, at its core, is when we take a column in a table and perform a mathematical operation on the whole thing. Like, adding up all the items in the "sales" column." 

Here, we want to COUNT all of the items in the table. But, we want to know what that count is by date! This requires GROUPING.

We start with a new variable to preserve the data in customers, which we don't want to overwrite then we apply the groupby method and pass it one argument: a LIST (marked by square brackets) of the columns we want to use as DIMENSIONS and a keyword argument called as_index and mark it False. This keyword argument establishes the INDEX of the table

(Note the little counter beside the table when we print it.)

Now we apply the .agg() method to this object we're chaining together. It takes one argument: a DICTIONARY (marked by curly braces) Dictionaries are sets of key-value pairs separated by a colon, themselves separated by commas the key is the name of the METRIC to be aggregated the value is the mathematical operation to be performed.

In [None]:
# CELL 5
cust = customers.groupby(['date'],as_index = False).agg({'Customer ID':np.size})
print(cust.head())

         date  Customer ID
0  2022-01-02          344
1  2022-01-03          456
2  2022-01-04          505
3  2022-01-05          506
4  2022-01-06          471


So now we have a dataframe. Notice one of our columns is called "Count." We're going to want to change that with the rename method.

Notice that the .rename() method takes one keyword-argument, which takes a dictionary as its parameter. The key is the old column name, the value is what you want to call it.


In [None]:
# CELL 6
cust = cust.rename(columns={'Customer ID':'count'})

# Visualization Step

Now let's make it pretty.

We'll start with a new variable that we can call.

Then, we're going to use the bar() function to construct our bar graph.

It takes three arguments:

* a dataframe, 
* an x-axis value and a y-axis value

Then, we use the show() method to display it.

Easier than a rum-punched gringo in the sun, as ol' Ma Webster is wont to say...

In [None]:
# CELL 7
fig = px.bar(cust, x='date', y='count')
fig.show()

# Write and Download Step

So, now that we have a dataframe in our notebook, we want to give it to somebody.

First we turn the dataframe into a CSV. THis process is called writing.

In [None]:
# CELL 8
cust.to_csv('customer daily count.csv')

Then, we can download it using the "files" library.

In [None]:
# CELL 9
files.download('customer daily count.csv')

<IPython.core.display.Javascript object>

<IPython.core.display.Javascript object>