# Learning Python

**By Anthony DeBarros**
[@anthonydb](https://twitter.com/anthonydb)

Welcome to Python! In this notebook, you'll find examples of Python code that start with the basics—the building blocks of the language—and work through examples including data transformation, reading data from an API, reading and writing files, and scraping a website. For additional information, the documentation at https://www.python.org/ is helpful.

## Getting Started

These exercises make a few assumptions:

- You have Python 3 installed on your computer.
- You know how to install packages using pip (I highly recommend you use a virtual environment). For details, see https://packaging.python.org/guides/installing-using-pip-and-virtual-environments/

Let's start with checking your Python version:

#### Checking your Python version

In [None]:
# To tell the notebook to execute the contents of a cell, press SHIFT + ENTER, or click the arrow in the toolbar.

import sys
print(sys.version)

#### Installing dependencies

For the more advanced portions of this exercise, you'll need to have the following Python libraries installed:

- requests
- BeautifulSoup 4
- pandas

You can do that via `pip`. Note that you may need to restart the notebook kernel to use the new packages.

In [None]:
%pip install requests beautifulsoup4 pandas

## Printing

The `print()` statement sends output to your terminal. Or, in this case, to the notebook.

In [None]:
# Example

print('Hello, world!')


## Variables

Variables hold data of several types, including strings, numbers and objects such as lists. To store a value in a variable, we use the `=` operator. Python is a "dynamically typed" language, meaning that you don't need to declare a variable's type before assigning a value to it. 

#### Integers and decimals

In [None]:
# Let's start with basic math. We can use Python as a calculator.

5 + 4.1

In [None]:
# But it's more powerful to assign these values to variables.

a = 5
b = 4.1

In [None]:
# Now you can do math with the variables!

print(a + b)
print(a + 5)
print(a / b)
print(a ** 2)

#### Strings

In [None]:
# When assigning string (text) values to a variable, you enclose the text in either single or double quotes.

a = 'Horse'
print(a)

In [None]:
# Strings have methods you can use to transform text.

print(a.upper())
print(a.lower())
print(a.replace('e', 'ey'))
print(a.replace('e', 'ey').lower())

# List of string methods at https://www.w3schools.com/python/python_ref_string.asp

In [None]:
# The methods do not alter the original string.
print(a)

In [None]:
# If you want to permanently store the change, assign the value to a new variable
uppercase_a = a.upper()
uppercase_a

In [None]:
# Concatenation: Combining strings into a single output.

car_make = 'Honda'
car_model = 'Civic'
car_year = '2012'

print(car_make + ' ' + car_model)

In [None]:
# Note: If you want to concatenate a number and a string, you must convert the number to a string
# using the str() function!

a = 5

print('The number is ' + str(a))

In [None]:
# String interpolation: Insert variables into strings.

name = 'Bobby'
product = 'beachball polish'
day = 'Friday'

print(f'Hi, {name}! Thanks for your purchase of {product} on {day}! Would you like to view more products like {product}?')

## Constants and Logical Operators

Frequently used [constants](https://docs.python.org/3/library/constants.html) and logical operators in Python include:

* `True` and `False` <-- Boolean values
* `None` <-- Indicates the absence of a value
* `==` <-- Tests for equality
* `and`, `or`, `not`

In [None]:
# Let's load up some variables.

flavor = 'chocolate'
cone = 'sugar'

In [None]:
# Using the equality operator == , you can test whether a variable holds a certain value.

# This will return False:

flavor == 'maple'

In [None]:
# This will return True:

flavor == 'chocolate'

In [None]:
# To test whether multiple conditions are True, use the `and` operator. 

flavor == 'chocolate' and cone == 'sugar'

In [None]:
# Use `or` to check for one or the other condition being True.

flavor == 'chocolate' or flavor == 'vanilla'

In [None]:
# Assign a value of True or False to a variable.

is_open = False
print(is_open)

In [None]:
# None is a special value that indicates "no value".

topping = None
print(topping)

In [None]:
topping is not None

## Data Structures

Python implements several data structures that are particularly suited to organizing data. They include:
* Lists
* Tuples
* Dictionaries

#### Lists

In [None]:
# Lists are a series of items enclosed in square brackets. Lists are mutable, meaning you can add,
# remove or re-order items. Also known as arrays.

car_models_list = ['Toyota', 'Buick', 'Kia', 'Jeep']

In [None]:
# Reference each item in a list by its position. The first item is at position 0.
# Retrieve the first item:

print(car_models_list[0])

In [None]:
# Slicing is a way to retrieve a range of items from a list.

# a[start:stop]  # from start through stop-1
# a[start:]      # from start through the rest of the list
# a[:stop]       # from the beginning of the list through stop-1
# a[:]           # the whole list


# Here, retrieve the first two items:

print(car_models_list[0:2])

In [None]:
# If you need the last item in a list, you can reference it like so:

print(car_models_list[-1])

In [None]:
# Find the length of a list.

print(len(car_models_list))

In [None]:
# Sort the list.

car_models_list.sort()
print(car_models_list)

In [None]:
# Add and remove items.

car_models_list.append('Honda')
car_models_list.remove('Kia')
print(car_models_list)

In [None]:
# Lists can hold items of various types, including more lists!

big_list = ['news', 1, ['house', 'rain'], 9]

# To reference the list within the list, use additional index values. For example,
# to get the value of 'rain' from the list:
print(big_list[2][1])

In [None]:
# Turn a sentence into a list with the split() string method

b = 'It was a dark and stormy night.'
print(b.split())

#### Tuples

In [None]:
# Tuples are similar to lists -- with two big exceptions:
# 1. The items are enclosed in parentheses.
# 2. Tuples are immutable. You cannot add, remove or change items.

flavors = ('chocolate', 'banana', 'cheesecake')
print(flavors[1])

In [None]:
# Why would you use a tuple instead of a list? One reason 
# is to store a finite collection of values that should never change.

#### Dictionaries

In [None]:
# Dictionaries store pairs of keys and values enclosed in curly braces. 
# You reference the value by providing the key name.

cars_dict = {'make': 'Honda', 'year': 2010, 'color': 'Silver'}

In [None]:
# Retrieve the value for the key 'year'

print(cars_dict['year'])

In [None]:
# Set a new value for the key 'year'

cars_dict['year'] = 2013
print(cars_dict['year'])

In [None]:
# Add a new key and value to the dictionary
cars_dict['condition'] = 'Good'
print(cars_dict)

In [None]:
# You can add a key whose values are a list!

cars_dict['owners'] = ['Susie', 'Liza', 'Marta']
print(cars_dict)

In [None]:
# Here's how to reference an item in that list:

print(cars_dict['owners'][1])

#### Combining structures

It's common in Python (and other programming languages) to combine data structures to facilitate handling large collections of data. Here are two examples.

In [None]:
# List of tuples

performers_lists = [
    (1, 'Daisy', 'Ridley'),
    (2, 'Mark', 'Hamill'),
    (3, 'John', 'Boyega')
]

# List of dictionaries

performers_dicts = [
    {'id': 1, 'first_name': 'Daisy', 'last_name': 'Ridley'},
    {'id': 2, 'first_name': 'Mark', 'last_name': 'Hamill'},
    {'id': 3, 'first_name': 'John', 'last_name': 'Boyega'}
]

In [None]:
# Accessing items

print(performers_lists[2][2])
print(performers_dicts[2]['last_name'])

## Control Flows

Control flows allow us to introduce logic and make our program behave based on criteria we determine.

#### `for` statement

In [None]:
# Iterate over a list or string and perform an action on each item.
# Note that Python requires indentation of four spaces for each line to be executed in the for statement.

number_list = [1, 7, 42]

for number in number_list:
    new_number = number * 5
    print(f'{number} times 5 equals {new_number}')
    

In [None]:
# You can nest a for statement inside a for statement.

numbers_lists = [
    [9, 2, 7],
    [10, 5, 21],
    [31, 22, 4]
]

for list in numbers_lists:
    for number in list:
        new_number = number * 5
        print(f'{number} times 5 equals {new_number}')
        

In [None]:
# Employing the range() function to create a number series to iterate over

for x in range(0,10):
    print(x)

#### `if ... elif ... else` statement

In [None]:
# Perform actions or control the flow of the program based on criteria.

temperature = 91
if temperature > 80:
    print('It\'s hot!')

In [None]:
# Add additional logic and control.

vote = 'Yes'

if vote == 'Yes':
    print('The vote is Yes')
elif vote == 'No':
    print('The vote is No')
else:
    print('It\'s some other vote')

In [None]:
# Handle multiple criteria and text cases

vote = 'yes'

if vote.upper() in ('YES', 'Y'):
    print('The vote is Yes')
elif vote.upper() in ('NO', 'N'):
    print('The vote is No')
else:
    print('It\'s some other vote')

#### `while` statement

In [None]:
# Perform an action as long as a criteria is true

count = 5

while (count >= 0):
    print('The count is: ' + str(count))
    count = count - 1

print('Liftoff!')


## Files: Opening, Reading, Writing, Saving

Opening a file and readings its contents is a common programming task. Here, we'll work with a plain text file. There are Python libraries that deal specifically with other file types, such as PDF or Excel.

In [None]:
# Open a file and print all the lines.

f = open('data/story.txt', 'r')

for line in f:
    print(line) 
    
f.close()

In [None]:
# Open a file and split all the words into a list

words_list = []

f = open('data/story.txt', 'r')

for line in f:
    for x in line.split():
        words_list.append(x.lower())

print(words_list)
f.close()

In [84]:
# Open a file and write to it.

f = open('output/my_file.txt', 'w')

f.write('Hello!\n')
f.write('This is the second line.')

f.close()

## Read and Transform Data from a CSV file

Among the most unspectacular but necessary data analysis tasks is reading, transforming, and saving data. You can get a lot done with just the Python standard library. We are going send data back and forth among formats, particularly CSV and JSON, and transform it along the way.

In [None]:
# Import libraries

import csv
from itertools import islice

In [None]:
# Open a file and use the reader function to display each line.
# file_reader is an iterable reader object.
# Each line in the file becomes a Python list, and 
# because each line is a list, you can reference specific elements.

with open('data/us_counties_2010.csv') as csv_file:
    file_reader = csv.reader(csv_file)
    for row in file_reader:
        print(row[0] + ',' + row[1] + ',' + row[9])

In [None]:
# We also can slice the reader object with itertools.islice() to remove
# the header and just fetch a few rows.

with open('data/us_counties_2010.csv') as csv_file:
    file_reader = csv.reader(csv_file)
    for row in islice(file_reader, 1, 4):
        print(row[0] + ',' + row[1] + ',' + row[9])

In [None]:
# DictReader creates an object in which each row is an ordered dictionary
# with key names from the header row.

with open('data/us_counties_2010.csv') as csv_file:
    file_reader = csv.DictReader(csv_file)
    for row in islice(file_reader, 0, 4):
        print(row)
        

In [None]:
# Then, you can pull elements of each line via their key name.

with open('data/us_counties_2010.csv') as csv_file:
    file_reader = csv.DictReader(csv_file)
    for row in islice(file_reader, 0, 4):
        print(row['NAME'] + ',' + row['STUSAB'] + ',' + row['POP100'])

In [None]:
# Write your output to a new CSV file

out_file = open('output/new_counties_2010.csv', 'w')
out_file.write('county, state, pop100\n')

with open('data/us_counties_2010.csv') as csv_file:
    file_reader = csv.DictReader(csv_file)
    for row in islice(file_reader, 0, 4):
        out_file.write(row['NAME'] + ',' + row['STUSAB'] + ',' + row['POP100'] + '\n')

out_file.close()
print('Finished')

## Convert a CSV file to JSON

In [None]:
# Dictionaries and lists are easily transformed to JSON.

import json
import collections

# Define an empty list of dictionaries. Each dict will hold data on one state.
state_pop_list = []

# Open and read the CSV.
with open('data/us_counties_2010.csv') as csv_file:
    file_reader = csv.DictReader(csv_file)

    # Turn each row into an ordered dictionary
    for row in islice(file_reader, 0, 4):       # Note that here you're slicing only a few lines for demo purposes. Remove the islice to convert the entire file.
        state_dict = collections.OrderedDict()
        state_dict['cty'] = row['NAME']
        state_dict['st'] = row['STUSAB']
        state_dict['pop2010'] = int(row['POP100'])
        # Append the dictionary to the list
        state_pop_list.append(state_dict)

# Use the json library to format the list of dicts as JSON and print.        
print(json.dumps(state_pop_list, indent=4))

# Write the results to a file.
json_out = json.dumps(state_pop_list)

with open('output/us_counties_2010.json', 'w') as j:
    j.write(json_out)

## Reading JSON from an API and Saving as a CSV

Many organizations -- including governments -- provide data via API, or Application Programming Interface. Depending on the system, an API can allow you to retrieve data, make use of system functions, or submit information.

In [None]:
# https://earthquake.usgs.gov/fdsnws/event/1/query?format=geojson&starttime=2019-09-08&endtime=2019-09-10

import csv         # standard Python module for CSV data; read/write CSV files
import json        # standard Python module for JSON data; converts JSON to Python datatypes and vice-versa
import datetime    # standard Python module for dates
import requests    # open source Python module for making web requests

# Create a string that holds our API URL.
quakes_url = 'https://earthquake.usgs.gov/fdsnws/event/1/query?format=geojson&starttime=2019-09-08&endtime=2019-09-10'

# Use requests to fetch everything at URL into a variable called r.
r = requests.get(quakes_url)

# r is now a requests 'object' that contains various methods.
# The r.text method will dispay the text retrieved from the URL.

# Convert the JSON to Python data and assign that to a variable called response.
response = json.loads(r.text)

#print(response)                                       # Prints the entire response
print(response['features'][0]['properties']['place'])  # Prints the contents of one JSON attribute


In [None]:
# Retrieve data and write to CSV.

# Open a file for writing
earthquakes = open('output/earthquakes.csv', 'w')

# Create a CSV writer object
quake_writer = csv.writer(earthquakes, delimiter=",")

# Create a list holding headers for the CSV columns and write to file
headers = ['TIME', 'PLACE', 'MAGNITUDE']
quake_writer.writerow(headers)

# How long is the list of dictionaries called 'features'?
length_of_list = len(response['features'])

# Iterate through the list of features
for i in range(0, length_of_list):
    # Retrieve each column
    time = response['features'][i]['properties']['time']
    place = response['features'][i]['properties']['place']
    magnitude = response['features'][i]['properties']['mag']
    
    # We have to convert the time from the 'epoch' format to something readable
    time = datetime.datetime.fromtimestamp(time / 1000.0).strftime('%Y-%m-%d %H:%M:%S')
    
    # Assemble the column values into a tuple and write to the CSV
    quake = (time, place, magnitude)
    quake_writer.writerow(quake)
    
# Close the CSV file    
earthquakes.close()

## Data Analysis Using pandas

pandas is a Python data analysis library. Learn more at https://pandas.pydata.org/

#### Largest county in square miles

In [None]:
# Import the pandas library and read the CSV.

import pandas as pd

pd.options.display.max_rows = 999   # This setting will increase the number of result rows you can see in the notebook.

census_df = pd.read_csv('data/us_counties_2010.csv')
census_df.head(4)

In [None]:
# Count number of counties per state.

census_df['STUSAB'].value_counts().reset_index(name='count')

# Also works:
# census_df.groupby('STUSAB')['STUSAB'].agg('count').reset_index(name='count')

In [None]:
# Total, mean and median of the pop100 column.

total = census_df['POP100'].sum()
mean = census_df['POP100'].mean()
median = census_df['POP100'].median()

print('Total: ' + str(total))
print('Mean: ' + str(mean))
print('Median: ' + str(median))

In [None]:
# Find the largest county in square miles.

# First, convert the AREALAND column values from square meters to square miles and add a column
# to the dataframe.
square_miles = (census_df['AREALAND'] / 2589988.110336)
census_df['square_miles'] = square_miles

# Round the column, then sort by square_miles and show the first 10 rows of the dataframe
census_df['square_miles'] = census_df['square_miles'].round(decimals = 1)
census_df.sort_values(by = 'square_miles', ascending = False).head(10)


#### Total Housing Units By State

In [None]:
# Here, you group the dataframe by the STUSAB column and calculate the sum

census_df.groupby(['STUSAB']).sum()[['HU100']]                    # creates a dataframe
# census_df.groupby('STUSAB').sum().HU100                         # creates a series
# census_df.groupby('STUSAB').sum().HU100.reset_index()           # creates a series and resets to a dataframe

# Output to CSV
#census_df.groupby('STUSAB').sum().HU100.to_csv('housing.csv', header=True)
#census_df.groupby('STUSAB').sum()[['HU100']].to_csv('housing.csv')

#### Population per Housing Unit

In [None]:
# Find the counties with the most people per housing unit.

pop_per_hu = (census_df['POP100'] / census_df['HU100'])
census_df['pop_per_hu'] = pop_per_hu
census_df['pop_per_hu'] = census_df['pop_per_hu'].round(decimals = 1)
census_df.sort_values(by = 'pop_per_hu', ascending = False).head(5)

#### Back to those Earthquakes ...

In [None]:
# Quakes sorted by magnitude

quakes_df = pd.read_csv('output/earthquakes.csv')
quakes_df.sort_values(by = 'MAGNITUDE', ascending = False).head(7)


## Scraping a Website

Sometimes, the data you want isn't readily downloadable. But you can retrieve it programmatically with Python from a website. Here' we'll scrape data from the FAA's airplane registry.

In [None]:
# URL to scrape:
# https://registry.faa.gov/currentreg/CurrentRegReport_Results.aspx?Mfrtxt=&sort_option=2&PageNo=1

# Import libraries
import re                        # standard Python module for regular expressions
import csv                       # standard Python module for CSV data; read/write CSV files
import requests                  # open source Python module for making web requests
from bs4 import BeautifulSoup    # open source Python module for parsing HTML

In [None]:
url = 'https://registry.faa.gov/currentreg/CurrentRegReport_Results.aspx?Mfrtxt=&Sort_Option=2&PageNo=1'
r = requests.get(url)
# print(r.text)

# Convert the HTML to a Beautiful Soup object -- turning HTML into Python objects such as lists and dictionaries.

html_soup = BeautifulSoup(r.text, 'html.parser')

In [None]:
# Find the table that has a certain CSS class

table = html_soup.find('table', {"class": "Repeater"})
print(table)

In [None]:
# Find all the table rows

for row in table.find_all('tr')[1:]:
    columns = row.find_all('td')
    link = columns[0].find('a').get('href')
    n_number = columns[0].find('a').text
    name = columns[1].find('span').text
    model = columns[2].find('span').text
    cert_date = columns[3].find('span').text
    name_address = columns[4].text
    print(link, n_number, name, model, cert_date, name_address)


In [None]:
# Same code, but here we write to a CSV:

csvfile = open('output/aircraft.csv', 'w')
csvwriter = csv.writer(csvfile, delimiter=',')

headers = ('LINK', 'N_NUMBER', 'NAME',
           'MODEL', 'CERT_DATE', 'NAME_ADDRESS')
csvwriter.writerow(headers)

for row in table.find_all('tr')[1:]:
    columns = row.find_all('td')
    link = columns[0].find('a').get('href')
    n_number = columns[0].find('a').text
    name = columns[1].find('span').text
    model = columns[2].find('span').text
    cert_date = columns[3].find('span').text
    name_address = columns[4].text
    
    parsed_row = (link, n_number, name, model, cert_date, name_address)
    
    csvwriter.writerow(parsed_row)

csvfile.close()

In [None]:
# How can you modify this code to iterate through each page of the table?
# One approach is to use a Python for statement to create one URL for each page
# in the table. You could then add your processing code to this.

url = 'https://registry.faa.gov/currentreg/CurrentRegReport_Results.aspx?Mfrtxt=&Sort_Option=2&PageNo='

for page in range(1, 78):
    url_scrape = url + str(page)
    print(url_scrape)
    
    # Processing happens here, once for each page.