# Introduction to Python

## Introduction

These are my notes from DataCamp's course [_Introduction to Python_](https://www.datacamp.com/courses/intro-to-python-for-data-science).

Course by Hugo Bowne-Anderson, with collaborators Vincent Vankrunkelsven and Filip Schouwenaars.

There is no prerequisite for this course.

This course is part of these tracks:

- Data Analysis with Python
- Data Engineer
- Data Scientist with Python
- Data Scientest Professional with Python
- Python Fundamentals
- Python Programmer

## Imports

Imports for the entire notebook are placed here for convenience and clarity.

In [None]:
import csv
import math
import sys

import numpy as np

## Resource

The source code for this course is available at https://github.com/datacamp/courses-introduction-to-python.

## Datasets

| Name | File |
| :--- | :--- |
| MLB (baseball) | baseball.csv |
| FIFA (soccer) | fifa.csv |

The baseball data set is available from http://wiki.stat.ucla.edu/socr/index.php/SOCR_Data_MLB_HeightsWeights, which provides the data only in HTML format.

### Baseball Data

#### Read the Data

Read the data from the CSV file and save certain columns into NumPy arrays.

In [None]:
# Read the height, weight, and age data from baseball.csv into numpy.array variables.
# Height is in inches, weight is in pounds, and age is in years.
# Read the data from the CSV file into lists.
bb_height_list = []
bb_weight_list = []
bb_age_list = []
with open("baseball.csv", newline="") as csv_file:
    csvreader = csv.reader(csv_file)
    header = next(csvreader)
    for row in csvreader:
        bb_height_list.append(int(row[3]))
        bb_weight_list.append(int(row[4]))
        bb_age_list.append(float(row[5]))

# Create numpy array variables.
np_bb_height_inches = np.array(bb_height_list)
np_bb_weight_pounds = np.array(bb_weight_list)
np_bb_age_years = np.array(bb_age_list)

### FIFA Data

#### Read the Data

Read the data from the CSV file into NumPy array variables.

In [None]:
# FIFA data.
# Ordinarily, I would use a csv.DictReader to read the data row by row.
# How do I efficiently convert the data into a numpy.ndarray?
# With mixed data such as that in this file, I don't necessarily want a
# single 2D array because all data must have the same type.

# These are the fields in fifa.csv:
# id, name, rating, position, height, foot, rare, pace, shooting, passing,
# dribbling, defending, heading, diving, handling, kicking, reflexes, speed,
# positioning
#   position is at index 3.
#   height is at index 4.

# Read the data from the CSV file into lists.
# Strip white space from each value.
fifa_position_list = []
fifa_height_list = []
with open("fifa.csv", newline="") as csv_file:
    csvreader = csv.reader(csv_file)
    header = next(csvreader)
    for row in csvreader:
        fifa_position_list.append(row[3].strip())
        fifa_height_list.append(int(row[4].strip()))
np_fifa_positions = np.array(fifa_position_list)
np_fifa_heights = np.array(fifa_height_list)
print("fifa_np_positions shape: ", np_fifa_positions.shape)
print("fifa_np_heights shape: ", np_fifa_heights.shape)

## Basics

### The Interface

This lesson explains how to use DataCamp's built-in IPython shell, which is part of the Jupyter ecosystem. On Spica (macOS) and Wezen (Windows 11 with Windows Subsystem for Linux), I am running Python 3.10.x, and pip3 installed IPython 8.4.0 (released 2022-05-28). At the time of writing, DataCamp is running Python 3.9.7.

In [None]:
sys.version

DataCamp's interface provides a script editing window at the top and an interactive shell at the bottom. To check a script before submitting an answer, click the Run Code button. (To see the output from a script, call the print() function.) Click the Submit Answer button to submit a script for checking by DataCamp's interface.

### Python as a Calculator

Python as a calculator introduces the +, -, \*, /, \*\*, and % operators.

### Variables and Types

Create new variables and use type() to find the type of a variable.

In [None]:
height = 1.79
weight = 68.7
bmi = weight / height ** 2
bmi

In [None]:
type(bmi)

In [None]:
day_of_week = 5
type(day_of_week)

In [None]:
x = "body mass type"
y = 'this works too'
type(y)

In [None]:
z = True
type(z)

Operators behave differently depending on the type of the object they're working on.

In [None]:
2 + 3

In [None]:
'ab' + 'cd'

#### Exercises

Create and use variables.

In [None]:
savings = 100
growth_multiplier = 1.1
result = savings * growth_multiplier ** 7
print(result)

Create variables of different types, and obtain the type of a variable using type():

In [None]:
desc = "Compound interest"
profitable = True
print(type(savings))
print(type(result))
print(type(desc))
print(type(profitable))

In [None]:
savings = 100
growth_multiplier = 1.1
desc = "compound interest"
year1 = savings * growth_multiplier
print(type(year1))
doubledesc = desc + desc
print(doubledesc)

### Type Conversion

Python often requires conversion of a type to complete an operation. Type conversion is critical when building strings.

#### Exercises

In [None]:
savings = 100
growth_multipler = 1.1
result = savings * growth_multiplier ** 7
print("I started with $" + str(savings) + " and now have $" + str(result) + ". Awesome!")


In [None]:
pi_string = "3.1415926"
pi_float = float(pi_string)
print(pi_float)

In [None]:
print("I can add integers, like " + str(5) + " to strings.")
print("I said " + ("Hey " * 2) + "Hey!")
print("The correct answer to this multiple choice exercise is answer number " + str(2))
print(True + False)

## Lists

### Creating Lists

A list is a collection of values of any types, where multiple types are allowed. Technically, a list is a one-dimensional array of objects, where objects of multiple types are allowed. A list can contain elements that are also lists.

#### Exercises

In [None]:
fam = [1.73, 1.68, 1.71, 1.89]
fam = ["liz", 1.73, "mom", 1.68, "emma", 1.71, "dad", 1.89]
print(fam)

In [None]:
fam2 = [
    ["liz", 1.73],
    ["mom", 1.68],
    ["emma", 1.71],
    ["dad", 1.89]]
print(fam2)

In [None]:
print(type(fam))

In [None]:
hall = 11.25
kit = 18.0
liv = 20.0
bed = 10.75
bath = 9.50
areas = [hall, kit, liv, bed, bath]
print(areas)

In [None]:
areas = ["hallway", hall, "kitchen", kit, "living room", liv,
         "bedroom", bed, "bathroom", bath]
print(areas)

In [None]:
house = [["hallway", hall],
        ["kitchen", kit],
        ["living room", liv],
        ["bedroom", bed],
        ["bathroom", bath]]
print(house)
print(type(house))

### Subsetting Lists

Indexes, which are 0-based, are used to selecting individual elements from a list. Slicing means selecting multiple elements from a list. When using a range to slice a list, the first index is included and the last index is excluded. Omitting an index in a range means extend the range to the beginning or end.

#### Exercises

In [None]:
print(fam)
print(fam[7])
print(fam[-1])
print(fam[3:5])
print(fam[:4])
print(fam[5:])

In [None]:
eat_sleep_area = areas[3] + areas[7]
print(eat_sleep_area)

In [None]:
downstairs = areas[0:6]
print(downstairs)
upstairs = areas[6:10]
print(upstairs)

In [None]:
downstairs = areas[:6]
print(downstairs)
upstairs = areas[6:]
print(upstairs)

In [None]:
print(house[-1][1])

### Manipulating Lists

Manipulation includes changing, adding, and removing list elements. Python uses references instead of copying the values. Use list() or a slice of all elements to copy one list to another.

In [None]:
fam = ['liz', 1.73, 'emma', 1.68, 'mom', 1.71, 'dad', 1.89]
print(fam)
fam[7] = 1.86
print(fam)
fam[0:2] = ['lisa', 1.74]
print(fam)

In [None]:
fam_ext = fam + ['me', 1.79]
print(fam_ext)

In [None]:
del(fam[2])
print(fam)

In [None]:
x = ["a", "b", "c"]
print(x)
y = x
y[1] = "z"
print(y)
print(x)

In [None]:
x = ["a", "b", "c"]
y = list(x)
z = x[:]
y[1] = "z"
print(x)
print(y)
print(z)

#### Exercises

In [None]:
print(areas)
areas[9] = 10.50
areas[4] = "chill zone"
print(areas)

In [None]:
areas_1 = areas + ["poolhouse", 24.5]
print(areas_1)
areas_2 = areas_1 + ["garage", 15.45]
print(areas_2)

In [None]:
del(areas_2[-4:-2])
print(areas_2)

In [None]:
areas_vals = [11.25, 18.0, 20.0, 10.75, 9.50]
areas_copy = list(areas_vals)
areas_copy[0] = 5.0
print(areas_vals)
print(areas_copy)

## Functions & Packages

### Functions

Examples of Python functions include print, str, int, bool, float, max, round, and help.

In [None]:
fam2 = [1.73, 1.68, 1.71, 1.89]
tallest = max(fam2)
print(tallest)

In [None]:
help(round)

In [None]:
height = 1.68
print(round(height, 1))
print(round(height))
print(round(height, 0))
print(round(number=height, ndigits=1))

In [None]:
print(round(568, -2))

#### Exercises

In [None]:
var1 = [1, 2, 3, 4]
var2 = True
print(type(var1))
print(len(var1))
out2 = int(var2)
print(out2)

In [None]:
help(sorted)

In [None]:
first = [11.25, 18.0, 20.0]
second = [10.75, 9.50]
full = first + second
full_sorted = sorted(full, reverse=True)
print(full_sorted)

### Methods

Methods are functions that belong to objects. Examples are str.capitalize, str.replace, float.bit_length, float.conjugate, list.index, and list.count.

In [None]:
fam = ['liz', 1.73, 'emma', 1.68, 'mom', 1.71, 'dad', 1.89]
print(fam.index("mom"))

In [None]:
fam.count(1.73)

In [None]:
sister = "liz"
print(sister.capitalize())

In [None]:
print(sister.replace("z", "sa"))

In [None]:
print(sister.index("z"))
print(fam.index("mom"))

In [None]:
print(fam)
fam.append("me")
print(fam)
fam.append(1.79)
print(fam)

#### Exercises

In [None]:
place = "poolhouse"
place_up = place.upper()
print(place)
print(place_up)

In [None]:
print(place.count("o"))

In [None]:
print(areas_vals)

In [None]:
print(areas_vals.index(20.0))
print(areas_vals.count(9.50))

In [None]:
# areas_vals.append([24.5, 15.45] appends the list as the last element).
areas_vals.append(24.5)
areas_vals.append(15.45)
print(areas_vals)
areas_vals.reverse()
print(areas_vals)

### Packages

A package is a directory that contains Python scripts, where each script is a module.

The user needs to install the pip (pip3) utility and use it to install packages such as NumPy.

In a script, the user must import the package or module. All imports are at the top of this notebook.

In [None]:
vals = np.array([1, 2, 3])
print(type(vals))
print(vals)

#### Exercises

In [None]:
r = 0.43
C = 2 * math.pi * r
A = math.pi * (r ** 2)
print("Circumference: " + str(C))
print("Area: " + str(A))

In [None]:
# math.radians converts degrees to radians.
orbit_radius = 192500
dist = orbit_radius * math.radians(12)
print(dist)

This is an example of importing the function inv from the scipy.linalg package:

`from scipy.linalg import inv as my_inv`


## Numpy

NumPy is designed to efficiently vectorize calculations on list-like objects. Note that no commas are included in the output when printing the values of an np.ndarray object. NumPy creates arrays of a single type. Some NumPy operators have different behavior from list operators.

### NumPy Arrays

In [None]:
heights = [1.73, 1.68, 1.71, 1.89, 1.79]
weights = [65.4, 59.2, 63.6, 88.4, 68.7]
print(type(heights))
np_heights = np.array(heights)
print(type(np_heights))
print(heights)
print(np_heights)
np_weights = np.array(weights)
print(np_weights)

In [None]:
bmi = np_weights / (np_heights ** 2)
bmi

In [None]:
# Note different behavior of the + operator.
python_list = [1, 2, 3]
numpy_array = np.array([1, 2, 3])
print(python_list + python_list)
print(numpy_array + numpy_array)

### NumPy Subsetting

In [None]:
bmi[1]

In [None]:
bmi > 23

In [None]:
bmi[bmi > 23]

#### Exercises

In [None]:
baseball = [180, 215, 210, 210, 188, 176, 209, 200]
np_baseball = np.array(baseball)
print(type(np_baseball))

In [None]:
# For all of the baseball data, convert the data to the units we need.
np_bb_heights_m = np_bb_height_inches * 0.0254
np_bb_weights_kg = np_bb_weight_pounds * 0.453592

# Calculate the BMIs.
bmi = np_bb_weights_kg / (np_bb_heights_m ** 2)
print(bmi)

In [None]:
light = bmi < 21
print(light)
print(bmi[light])

In [None]:
# Type coercion. True is coerced to 1, False to 0.
print(np.array([True, 1, 2]) + np.array([3, 4, False]))

In [None]:
# Indexing numpy.ndarray objects.
print(np_bb_weight_pounds[50])

In [None]:
print(np_bb_height_inches[100:111])

### 2D NumPy Arrays

In [None]:
np_2d = np.array([[1.73, 1.68, 1.71, 1.89, 1.79],
                 [65.4, 59.2, 63.6, 88.4, 68.7]])
np_2d

numpy.ndarray.shape returns the dimensions of the numpy.ndarray (rows, columns for a 2-D array).

In [None]:
np_2d.shape

In [None]:
# Indexing a 2-D array.
np_2d[0][2]

In [None]:
# Alternative method for indexing.
np_2d[0, 2]

In [None]:
np_2d[:, 1:3]

In [None]:
np_2d[1, :]

#### Exercises

In [None]:
baseball_small = [[180, 78.4], [215, 102.7], [210, 98.5], [188, 75.2]]
np_baseball_small = np.array(baseball_small)
print(type(np_baseball_small))
print(np_baseball_small.shape)

In [None]:
# Create a 2D array of 1,015 rows with 2 columns per row.
np_baseball = np.column_stack((bb_height_list, bb_weight_list))
print(np_baseball.shape)

In [None]:
print(np_baseball)

In [None]:
# Print the 50th row of np_baseball
print(np_baseball[50])
print(np_baseball[50, :])

In [None]:
# Select the entire second column of np_baseball as np_weight_lb.
np_weight_lb = np_baseball[:, 1]
print(np_weight_lb)

In [None]:
# Print height of 124th player.
print(np_baseball[123][0])

### 2D Arithmetic

In [None]:
np_mat = np.array([[1, 2], [3, 4], [5, 6]])
np_mat

In [None]:
np_mat * 2

In [None]:
np_mat + np.array([10, 10])

In [None]:
np_mat + np_mat

#### Exercises

In [None]:
# This exercise requires a numpy.ndarray containing the height (inches),
# weight (pounds), and age (years) of the players.
# Note that all values were converted to floats.
np_baseball2 = np.array([bb_height_list, bb_weight_list, bb_age_list]).T
np_baseball2

In [None]:
# Convert height from inches to meters and weight from pounds to kilograms.
conversion = np.array([0.0254, 0.453592, 1.0])
print(np_baseball2 * conversion)

### Basic Statistics

After loading data into NumPy arrays, it is easy to use NumPy methods to generate summary statistics about the data. Imagine you have collected the heights and weights of 5,000 people, collected into a NumPy 2D array of 5,000 rows and 2 columns, with heights in meters and weights in kilograms.

The following code demonstrates how to use NumPy to generate simulated data and create some summary statistics.

In [None]:
height = np.round(np.random.normal(1.75, 0.20, 5000), 2)
weight = np.round(np.random.normal(60.32, 15, 5000), 2)
np_city = np.column_stack((height, weight))
np_city

In [None]:
height_mean = np.mean(np_city[:, 0])
print(height_mean)

In [None]:
height_median = np.median(np_city[:, 0])
print(height_median)

In [None]:
height_stddev = np.std(np_city[:, 0])
print(height_stddev)

In [None]:
height_max = max(np_city[:, 0])
print(height_max)

In [None]:
height_min = min(np_city[:, 0])
print(height_min)

In [None]:
# Correlation coefficient of height to weight.
# There is no correlation in this simulated data.
corrcoef = np.corrcoef(np_city[:, 0], np_city[:, 1])
print(corrcoef)

#### Exercises

In [None]:
# In the exercise, the mean height is 1586.46. This is not true of the data
# from the baseball.csv file.
np_height_in = np_baseball[:, 0]
print("Mean height: ", np.mean(np_height_in))
print("Median height: ", np.median(np_height_in))

In [None]:
# Coefficient of correlation.
print("Coefficient of correlation: ", np.corrcoef(np_baseball[:, 0], np_baseball[:, 1]))

In [None]:
# I can create a list of booleans from np_positions and apply it to
# np_heights.
gk_heights = np_fifa_heights[np_fifa_positions == "GK"]
other_heights = np_fifa_heights[np_fifa_positions != "GK"]
print("Median height of goalkeepers: ", str(np.median(gk_heights)))
print("Median height of other players: ", str(np.median(other_heights)))