## **2022-12-15 `03-Python - Day 3 - Python Deeper Dive`**

### **Objectives**

* Create and use Python `dictionaries`.
* Read in data from a `dictionary`.
* Use `list` comprehensions.
* Write and reuse Python `functions`.
* Use coding `logic` and `reasoning`.
* `Add`, `commit`, and `push` code to GitHub from the command line.
---

### **Presentation**
* [Python Deeper Dive](https://ucb.bootcampcontent.com/UCB-Coding-Bootcamp/UCB-VIRT-DATA-PT-11-2022-U-LOLC/-/blob/main/slides/3.3_Deeper_Dive_into_Python.pdf)
---

# ===============================

### 3.01 Students Do: Cereal Cleaner (0:20)

# Instructions
* Open the file, `cereal.csv` and start by skipping the header row. See hints below for this.
* Read through the remaining rows and find the cereals that contain five grams of fiber or more, printing the data from those rows to the terminal.

  > Hint
    * Everything within the csv is stored as a string and certain rows have a decimal. This means that they will have to be cast to be used.
    * You may have to use a string `encode()` method when you are opening and reading the file. This will require you to add a parameter in the `with open()` function. Refer to these stackoverflow to help you to better understand what encoder to use.
        1. [Weird character added](https://stackoverflow.com/questions/22974765/weird-characters-added-to-first-column-name-after-reading-a-toad-exported-csv-fi)

        2. [Difference between utf-8 and utf-sig](https://stackoverflow.com/questions/57152985/what-is-the-difference-between-utf-8-and-utf-8-sig)
    * The `csv.reader` begins reading the csv file at the first row. `next(csv_reader, None)` will skip the header row. Refer to this stackoverflow answer on [how to skip the header](https://stackoverflow.com/a/14257599) for more information.
    * Integers in Python are whole numbers and, as such, cannot contain decimals. As such, your numbers containing decimal points will have to be cast as a `float`.

## Bonus
* Try the following again but this time using `cereal_bonus.csv`, which does not include a header.
> **Note:** Refer to the encoder hint in the **hint** section above. 

---

In [1]:
import os
import csv

cereal_csv = os.path.join("01-Stu_CerealCleaner", "Resources", "cereal.csv")

# Open and read csv
with open(cereal_csv, "r", encoding='utf-8-sig') as csv_file:
    csv_reader = csv.reader(csv_file, delimiter=",")

    # Read the header row first (skip this part if there is no header)
    csv_header = next(csv_file)
    print(f"Header: {csv_header}")

    # Read through each row of data after the header
    for row in csv_reader:

        # Convert row to float and compare to grams of fiber
        if float(row[7]) >= 5:
            print(row)


Header: name,mfr,type,calories,protein,fat,sodium,fiber,carbo,sugars,potass,vitamins,shelf,weight,cups,rating

['100% Bran', 'N', 'C', '70', '4', '1', '130', '10', '5', '6', '280', '25', '3', '1', '0.33', '68.402973']
['All-Bran', 'K', 'C', '70', '4', '1', '260', '9', '7', '5', '320', '25', '3', '1', '0.33', '59.425505']
['All-Bran with Extra Fiber', 'K', 'C', '50', '4', '0', '140', '14', '8', '0', '330', '25', '3', '1', '0.5', '93.704912']
['Bran Flakes', 'P', 'C', '90', '3', '0', '210', '5', '13', '5', '190', '25', '3', '1', '0.67', '53.313813']
['Fruit & Fibre Dates; Walnuts; and Oats', 'P', 'C', '120', '3', '2', '160', '5', '12', '10', '200', '25', '3', '1.25', '0.67', '40.917047']
['Fruitful Bran', 'K', 'C', '120', '3', '0', '240', '5', '14', '12', '190', '25', '3', '1.33', '0.67', '41.015492']
['Post Nat. Raisin Bran', 'P', 'C', '120', '3', '1', '200', '6', '11', '14', '260', '25', '3', '1.33', '0.67', '37.840594']
['Raisin Bran', 'K', 'C', '120', '3', '1', '210', '5', '14', '12'

In [2]:
import os
import csv

cereal_csv = os.path.join("01-Stu_CerealCleaner", "Resources", "cereal_bonus.csv")

with open(cereal_csv, encoding='utf-8-sig') as csvfile:
    csv_reader = csv.reader(csvfile, delimiter=",")

    # @NOTE: This time, we do not use `next(csv_reader)` because there is no header for this file

    # Read through each row of data after the header
    for row in csv_reader:

        # Convert row to float and compare to grams of fiber
        if float(row[7]) >= 5:
            print(row)


['100% Bran', 'N', 'C', '70', '4', '1', '130', '10', '5', '6', '280', '25', '3', '1', '0.33', '68.402973']
['All-Bran', 'K', 'C', '70', '4', '1', '260', '9', '7', '5', '320', '25', '3', '1', '0.33', '59.425505']
['All-Bran with Extra Fiber', 'K', 'C', '50', '4', '0', '140', '14', '8', '0', '330', '25', '3', '1', '0.5', '93.704912']
['Bran Flakes', 'P', 'C', '90', '3', '0', '210', '5', '13', '5', '190', '25', '3', '1', '0.67', '53.313813']
['Fruit & Fibre Dates; Walnuts; and Oats', 'P', 'C', '120', '3', '2', '160', '5', '12', '10', '200', '25', '3', '1.25', '0.67', '40.917047']
['Fruitful Bran', 'K', 'C', '120', '3', '0', '240', '5', '14', '12', '190', '25', '3', '1.33', '0.67', '41.015492']
['Post Nat. Raisin Bran', 'P', 'C', '120', '3', '1', '200', '6', '11', '14', '260', '25', '3', '1.33', '0.67', '37.840594']
['Raisin Bran', 'K', 'C', '120', '3', '1', '210', '5', '14', '12', '240', '25', '2', '1.33', '0.75', '39.259197']


# ===============================

### 3.02 Instructor Do: Dictionaries (0:05)

In [3]:
# Unlike lists, dictionaries store information in pairs
# ---------------------------------------------------------------

# Create a dictionary to hold the actor's names.
actors = {}

# Create a dictionary using the built-in function.
actors = dict()

# A dictionary of an actor.
actors = {"name": "Tom Cruise"}
print(f'{actors["name"]}')

# Add an actor to the dictionary with the key "name"
# and the value "Denzel Washington".
actors["name"] = "Denzel Washington"

# Print the actors dictionary.
print(actors)

# Print only the actor.
print(f'{actors["name"]}')

# A list of actors
actors_list = [
    "Tom Cruise",
    "Angelina Jolie",
    "Kristen Stewart",
    "Denzel Washington"]

# Overwrite the value, "Denzel Washington", with the list of actors.
actors["name"] = actors_list

# Print the first actor
print(f'{actors["name"][0]}')

# ---------------------------------------------------------------

# A dictionary can contain multiple pairs of information
actress = {
    "name": "Angelina Jolie",
    "genre": "Action",
    "nationality": "United States"
}

# ---------------------------------------------------------------

# A dictionary can contain multiple types of information
another_actor = {
    "name": "Sylvester Stallone",
    "age": 62,
    "married": True,
    "best movies": [
        "Rocky",
        "Rocky 2",
        "Rocky 3"]}
print(f'{another_actor["name"]} was in {another_actor["best movies"][0]}')
# ---------------------------------------------------------------

# A dictionary can even contain another dictionary
film = {
    "title": "Interstellar",
    "revenues": {
        "United States": 360,
        "China": 250,
        "United Kingdom": 73
    }
}
print(f'{film["title"]} made {film["revenues"]["United States"]}'" million dollars in the US.")
# ---------------------------------------------------------------


Tom Cruise
{'name': 'Denzel Washington'}
Denzel Washington
Tom Cruise
Sylvester Stallone was in Rocky
Interstellar made 360 million dollars in the US.


# ===============================

### 3.03 Students Do: Hobby-Book - Dictionaries (0:15)

## Instructions

* Create a dictionary to store the following:

  * Your name
  * Your age
  * A list of a few of your hobbies
  * A dictionary of a few times you wake up during the week

* Print out your name, how many hobbies you have and a time you get up during the week.


In [4]:
# Dictionary full of info
my_info = {"name": "Rex",
           "occupation": "dog",
           "age": 21,
           "hobbies": ["barking", "eating", "sleeping", "loving my owner"],
           "wake-up": {"Mon": 5, "Friday": 5, "Saturday": 10, "Sunday": 9}}

# Print out results are stored in the dictionary
print(f'Hello I am {my_info["name"]} and I am a {my_info["occupation"]}')
print(f'I have {len(my_info["hobbies"])} hobbies!')
print(f'On the weekend I get up at {my_info["wake-up"]["Saturday"]}')


Hello I am Rex and I am a dog
I have 4 hobbies!
On the weekend I get up at 10


# ===============================

### 3.04 Everyone Do: List Comprehensions (0:10)

In [5]:
fish = "halibut"

# Loop through each letter in the string
# and push to an array
letters = []
for letter in fish:
    letters.append(letter)

print(letters)

# List comprehensions provide concise syntax for creating lists
letters = [letter for letter in fish]

print(letters)

# We can manipulate each element as we go
capital_letters = []
for letter in fish:
    capital_letters.append(letter.upper())

print(capital_letters)

# List Comprehension for the above
capital_letters = [letter.upper() for letter in fish]

print(capital_letters)

# We can also add conditional logic (if statements) to a list comprehension
july_temperatures = [87, 85, 92, 79, 106]
hot_days = []
for temperature in july_temperatures:
    if temperature > 90:
        hot_days.append(temperature)
print(hot_days)

# List Comprehension with conditional
hot_days = [temperature for temperature in july_temperatures if temperature > 90]

print(hot_days)


['h', 'a', 'l', 'i', 'b', 'u', 't']
['h', 'a', 'l', 'i', 'b', 'u', 't']
['H', 'A', 'L', 'I', 'B', 'U', 'T']
['H', 'A', 'L', 'I', 'B', 'U', 'T']
[92, 106]
[92, 106]


# ===============================

### 3.05 Students Do: List Comprehensions (0:10)

In this activity, you will use list comprehensions to compose a wedding invitation to send to every name on your mailing list.
## Instructions
* Open the file called `comprehensions.py`.
* Run the provided program. Note that nothing forces you to write the name "properly"—e.g., as "Jane" instead of "jAnE". You will use list comprehensions to fix this.
  * First, use list comprehensions to create a new list that contains the lowercase version of each of the names your user provided.
  * Then, use list comprehensions to create a new list that contains the capitalize versions of each of the names in your lower-cased list.
## Bonuses
* Create a thank you note to address to all guests using f-string and print the statement using `title()` function. 
## Hints
* See the documentation for the [title](https://docs.python.org/3/library/stdtypes.html#str.title) method.
---

In [6]:
names = []
for _ in range(5):
    name = input("Please enter the name of someone you know. ")
    names.append(name)

lowercased = [name.lower() for name in names]
caps = [name.capitalize() for name in lowercased]
invitations = [
    f"Dear {name}, please come to the wedding this Saturday!" for name in caps]

for invitation in invitations:
    print(invitation)

# BONUS
# Create a thank you note to be addressed to all guests using
# f-string and print the statement using `title()` function.

ty_note = [
    f"Dear {name}, thanks for coming and celebrating with us!" for name in caps]
for note in ty_note:
    print(note.title())


Please enter the name of someone you know.  khaled
Please enter the name of someone you know.  karman
Please enter the name of someone you know.  abdo
Please enter the name of someone you know.  ali
Please enter the name of someone you know.  ahmed


Dear Khaled, please come to the wedding this Saturday!
Dear Karman, please come to the wedding this Saturday!
Dear Abdo, please come to the wedding this Saturday!
Dear Ali, please come to the wedding this Saturday!
Dear Ahmed, please come to the wedding this Saturday!
Dear Khaled, Thanks For Coming And Celebrating With Us!
Dear Karman, Thanks For Coming And Celebrating With Us!
Dear Abdo, Thanks For Coming And Celebrating With Us!
Dear Ali, Thanks For Coming And Celebrating With Us!
Dear Ahmed, Thanks For Coming And Celebrating With Us!


# ===============================

### 3.06 Instructor Do: Functions (0:15)

In [7]:
# Basic Definition
def name(parameters):
    # code goes here
    return


# Simple Function with no parameters
def show():
    print(f"Hi!")


# You use parenthesis to run the code in a function
show()


# Simple function with one parameter
def show(message):
    print(message)


# Think of the parameter `message` as a variable
# You assign the string "Hello, World!" when you call the function
# This is like saying `message = "Hello, World!"`
show("Hello, World!")


# Functions can have more than one parameter
def make_quesadilla(protein, topping):
    quesadilla = f"Here is a {protein} quesadilla with {topping}"
    print(quesadilla)


# Supply the arguments (values) when calling the function
make_quesadilla("beef", "guacamole")
make_quesadilla("chicken", "salsa")

# @NOTE: Order is important when supplying arguments!
make_quesadilla("sour cream", "beef")


# We can also specify default values for parameters
def make_quesadilla(protein, topping="sour cream"):
    quesadilla = f"Here is a {protein} quesadilla with {topping}"
    print(quesadilla)


# Make a quesadilla using the default topping
make_quesadilla("chicken")

# Make a quesadilla with a new topping
make_quesadilla("beef", "guacamole")


# Functions can return a value
def square(number):
    return number * number


# You can save the value that is returned
squared = square(2)
print(squared)

# You can also just print the return value of a function
print(square(2))
print(square(3))


Hi!
Hello, World!
Here is a beef quesadilla with guacamole
Here is a chicken quesadilla with salsa
Here is a sour cream quesadilla with beef
Here is a chicken quesadilla with sour cream
Here is a beef quesadilla with guacamole
4
4
9


# ===============================

### 3.07 Students Do: Functions (0:10)

In this activity, you will write a function to compute the arithmetic mean (average) for a list of numbers.
## Instructions
* Write a function called `average` that accepts a list of numbers.
  * The function `average` should return the arithmetic [mean](https://en.wikipedia.org/wiki/Arithmetic_mean) (average) for a list of numbers.
* Test your function by calling it with different values and printing the results.
## Hints
* [Arithmetic Mean (Average)](https://en.wikipedia.org/wiki/Arithmetic_mean)
---

In [8]:
# Write a function that returns the arithmetic average for a list of numbers
def average(numbers):
    length = len(numbers)
    total = 0.0
    for number in numbers:
        total += number
    return total / length


# Test your function with the following:
print(average([1, 5, 9]))
print(average(range(11)))


5.0
5.0


# ===============================

---
### BREAK (0:10)
---

### 3.08 Partners Do: Wrestling With Functions (0:15)

## Instructions
* Analyze the code and CSV provided, looking specifically for what needs to still be added to the application.
* Using the starter code provided, create a function called `print_percentages` which takes in a parameter called `wrestler_data` and does the following:
  * Uses the data stored within `wrestler_data` to calculate the percentage of matches the wrestler won, lost, and drew over the course of a year.
  * Prints out the stats for the wrestler to the terminal.
## Bonus
* Still within the `print_percentages()` function, create a conditional that checks a wrestler's loss percentage and prints either "Jobber" to the screen if the number was greater than fifty or "Superstar" if the number was less than 50.
---

In [9]:
import os
import csv

# Path to collect data from the Resources folder
wrestling_csv = os.path.join('08-Par_WrestlingWithFunctions', 'Resources', 'WWE-Data-2016.csv')


# Define the function and have it accept the 'wrestler_data' as its sole parameter
def print_percentages(wrestler_data):
    # For readability, it can help to assign your values to variables with descriptive names
    name = str(wrestler_data[0])
    wins = int(wrestler_data[1])
    losses = int(wrestler_data[2])
    draws = int(wrestler_data[3])

    # Total matches can be found by adding wins, losses, and draws together
    total_matches = wins + losses + draws

    # Win percent can be found by dividing the the total wins by the total matches and multiplying by 100
    win_percent = (wins / total_matches) * 100

    # Loss percent can be found by dividing the total losses by the total matches and multiplying by 100
    loss_percent = (losses / total_matches) * 100

    # Draw percent can be found by dividing the total draws by the total matches and multiplying by 100
    draw_percent = (draws / total_matches) * 100

    # If the loss percentage is over 50, type_of_wrestler is "Jobber". Otherwise it is "Superstar".
    if loss_percent > 50:
        type_of_wrestler = "Jobber"
    else:
        type_of_wrestler = "Superstar"

    # Print out the wrestler's name and their percentage stats
    print(f"Stats for {name}")
    print(f"WIN PERCENT: {win_percent}")
    print(f"LOSS PERCENT: {loss_percent}")
    print(f"DRAW PERCENT: {draw_percent}")
    print(f"{name} is a {type_of_wrestler}")


# Read in the CSV file
with open(wrestling_csv, 'r') as csvfile:

    # Split the data on commas
    csvreader = csv.reader(csvfile, delimiter=',')

    header = next(csvreader)

    # Prompt the user for what wrestler they would like to search for
    name_to_check = input("What wrestler do you want to look for? ")

    # Loop through the data
    for row in csvreader:

        # If the wrestler's name in a row is equal to that which the user input, run the 'print_percentages()' function
        if name_to_check == row[0]:
            print_percentages(row)


What wrestler do you want to look for?  a


# ===============================

### 3.09 Instructor Do: Intro to Git (0:25)

![image.png](attachment:48181f96-dc5b-46e5-b363-8ba24e1b36a4.png)

![image.png](attachment:53cfabe9-8ac6-4dc7-8dbe-2602a3380b7b.png)

# ===============================

### 3.10 Everyone Do: Adding Files from the Command Line (0:10)

```bash
  # Displays that status of files in the folder
  git status

  # Adds all the files into a staging area
  git add .

  # Check that thr files were added correctly
  git status

  # Commits all the files to your repo and adds a message
  git commit -m <add commit message here>

  # Pushes the changes up to GitHub
  git push origin main
  ```


# ===============================

### 3.11 Students Do: Adding more to the repo (0:15)

**Instructions**

  * Using the repo that just created, make or add the following changes:

    * Add new lines of code to one of the python files.
    * Create a new folder.
    * Add a file to the newly created folder.
    * Add, commit and push the changes.
    * Delete the new folder.
    * Add, commit and push the changes again.

# ===============================

### 3.12 Extra Do: Additional Activities

**Main Method Model**
* When running a python file as a script, we utilize `if __name__ == "__main__":`.
* Add more here
* Reference this link for more information: [Main Method Model](https://docs.python.org/3/library/__main__.html)
---

**Set Operations.**

In [12]:
hero1 = {"smart", "rich", "armored", "martial_artist", "strong"}
hero2 = {"smart", "fast", "strong", "invulnerable", "antigravity"}

print(type(hero1))
print(type(hero2))

In [13]:
# Attributes for both heros (union)
hero1 | hero2
hero1.union(hero2)

{'antigravity',
 'armored',
 'fast',
 'invulnerable',
 'martial_artist',
 'rich',
 'smart',
 'strong'}

In [14]:
# Attributes common to both heros (intersection)
hero1 & hero2
hero1.intersection(hero2)

{'smart', 'strong'}

In [15]:
# Attributes in hero1 that are not in hero2 (difference)
hero1 - hero2
hero1.difference(hero2)

{'armored', 'martial_artist', 'rich'}

In [16]:
# Attributes in hero2 that are not in hero1 (difference)
hero2 - hero1
hero2.difference(hero1)

{'antigravity', 'fast', 'invulnerable'}

In [17]:
# Attributes for hero1 or hero2 but not both (symmetric difference)
hero1 ^ hero2
hero1.symmetric_difference(hero2)

{'antigravity', 'armored', 'fast', 'invulnerable', 'martial_artist', 'rich'}

---
# Resume Analysis
In this activity, you will generate a Python script to analyze a resume text file.
## Instructions
* Read the resume file as text using the `with` statement.
* Create a list containing all words in the resume.
  * Convert each word to lowercase to normalize the data.
* Use `split` to remove and trailing punctuation so only words remain.
* Create a set of unique words from the resume using `set()`.
* Use set operations to filter out all remaining punctuation from the set of words.
  * Create a set from `string.punctuation` to use in the difference operation.
* Use the cleaned set (no punctuation) to find all of the words from the resume that match the required skills.
* Use the cleaned set (no punctuation) to find all of the words that match the desired skills.
## Bonuses
* Count the number of occurrences for each word in the resume and print the top 10 occuring words in the resume.
  * Use a dictionary data structure to hold the counts for each word.
  * Make sure to remove punctuation and [stop words](https://en.wikipedia.org/wiki/Stop_words)
## Hints
* Carefully consider when to use a Dictionary data structure vs. a Set data structure when operating on Unique and Non-unique elements.


In [19]:
# -*- coding: UTF-8 -*-
"""Resume Analysis Module."""

'Resume Analysis Module.'

In [20]:
import os
import string

# Counter is used for the bonus solution
from collections import Counter

In [21]:
# Paths
resume_path = os.path.join("..", "Extra_Content", "ADVANCED_Stu_Resume_Analysis", "Unsolved", "resume.md")

In [22]:
# Skills to match
REQUIRED_SKILLS = {"excel", "python", "mysql", "statistics"}
DESIRED_SKILLS = {"r", "git", "html", "css", "leaflet"}

In [23]:
def load_file(filepath):
    """Helper function to read a file and return the data."""
    with open(filepath, "r") as resume_file_handler:
        return resume_file_handler.read().lower().split()

In [24]:
# Grab the text for a Resume
word_list = load_file(resume_path)

In [25]:
# Create a set of unique words from the resume
resume = set()

In [26]:
# Remove trailing punctuation from words
for token in word_list:
    resume.add(token.split(',')[0].split('.')[0])
print(resume)

{'sets', 'hadoop', 'bootstrap', 'html', 'microsoft', 'with', 'files', 'tableau', 'scripts', 'frank', 'big', 'basic', 'experience', 'education', 'git/github', '#', 'd3', 'software', 'media', 'n', 'html/css', 'pandas', 'machine', 'api', 'mysql', '*', 'writing', 'and', 'open-source', 'pivot', 'performing', 'forecasting', 'algorithms', 'camp', 'mining', 'sql', 'from', 'cloud', 'skills', 'vba', '##', 'creating', 'leaflet', 'apis', 'visualizations', 'apps', 'advanced', 'developing', 'the', 'modeling', 'analyze', 'css', 'in', 'interactions', 'visualization', 'databases', 'contributing', 'analytics', 'aws', 'mongodb', 'front-end', 'tables', 'r', 'boot', 'working', 'graduate', 'intelligence', 'excel', 'business', 'using', 'web', 'statistics', 'data', 'python', 'stein', 'javascript', 'to', 'social', 'learning', 'interests', 'designing'}


In [27]:
# Remove Punctuation that were read as whole words
punctuation = set(string.punctuation)
resume = resume - punctuation
print(resume)

{'sets', 'hadoop', 'bootstrap', 'html', 'microsoft', 'with', 'files', 'tableau', 'scripts', 'frank', 'big', 'basic', 'experience', 'education', 'git/github', 'd3', 'software', 'media', 'n', 'html/css', 'pandas', 'machine', 'api', 'mysql', 'writing', 'and', 'open-source', 'pivot', 'performing', 'forecasting', 'algorithms', 'camp', 'mining', 'sql', 'from', 'cloud', 'skills', 'vba', '##', 'creating', 'leaflet', 'apis', 'visualizations', 'apps', 'advanced', 'developing', 'the', 'modeling', 'analyze', 'css', 'in', 'interactions', 'visualization', 'databases', 'contributing', 'analytics', 'aws', 'mongodb', 'front-end', 'tables', 'r', 'boot', 'working', 'graduate', 'intelligence', 'excel', 'business', 'using', 'web', 'statistics', 'data', 'python', 'stein', 'javascript', 'to', 'social', 'learning', 'interests', 'designing'}


In [28]:
# Calculate the Required Skills Match using Set Intersection
print(resume & REQUIRED_SKILLS)

{'mysql', 'python', 'statistics', 'excel'}


In [29]:
# Calculate the Desired Skills Match using Set Intersection
print(resume & DESIRED_SKILLS)

{'html', 'css', 'r', 'leaflet'}


In [30]:
# Bonus: Resume Word Count
# ==========================
# Initialize a dictionary with default values equal to zero
word_count = {}.fromkeys(word_list, 0)

In [31]:
# Loop through the word list and count each word.
for word in word_list:
    word_count[word] += 1
print(word_count)

{'#': 1, 'frank': 1, 'n.': 1, 'stein': 1, '##': 4, 'education': 1, '*': 15, 'data': 7, 'analytics': 3, 'and': 8, 'visualization': 2, 'boot': 1, 'camp': 1, 'graduate': 1, 'experience': 1, 'creating': 1, 'pivot': 1, 'tables': 1, 'vba': 1, 'scripts': 2, 'in': 2, 'excel.': 1, 'modeling': 1, 'forecasting': 1, 'using': 5, 'basic': 1, 'statistics': 1, 'writing': 1, 'python': 3, 'to': 2, 'analyze': 1, 'sets': 1, 'from': 1, 'files': 1, 'apis.': 1, 'social': 2, 'media': 2, 'mining': 1, 'working': 3, 'with': 6, 'mysql': 1, 'mongodb': 1, 'databases': 1, 'developing': 1, 'front-end': 1, 'web': 2, 'visualizations': 1, 'html,': 2, 'css,': 2, 'bootstrap,': 1, 'd3,': 1, 'leaflet.js': 1, 'the': 2, 'tableau': 1, 'business': 1, 'intelligence': 1, 'software': 2, 'performing': 1, 'big': 2, 'hadoop': 1, 'machine': 2, 'learning': 1, 'algorithms': 1, 'skills': 1, 'microsoft': 1, 'excel,': 1, 'python,': 1, 'javascript,': 2, 'html/css,': 1, 'api': 1, 'interactions,': 1, 'mining,': 1, 'sql,': 1, 'hadoop,': 1, 'ta

In [32]:
# Bonus using collections.Counter
word_counter = Counter(word_list)
print(word_counter)

Counter({'*': 15, 'and': 8, 'data': 7, 'with': 6, 'using': 5, '##': 4, 'analytics': 3, 'python': 3, 'working': 3, 'visualization': 2, 'scripts': 2, 'in': 2, 'to': 2, 'social': 2, 'media': 2, 'web': 2, 'html,': 2, 'css,': 2, 'the': 2, 'software': 2, 'big': 2, 'machine': 2, 'javascript,': 2, '#': 1, 'frank': 1, 'n.': 1, 'stein': 1, 'education': 1, 'boot': 1, 'camp': 1, 'graduate': 1, 'experience': 1, 'creating': 1, 'pivot': 1, 'tables': 1, 'vba': 1, 'excel.': 1, 'modeling': 1, 'forecasting': 1, 'basic': 1, 'statistics': 1, 'writing': 1, 'analyze': 1, 'sets': 1, 'from': 1, 'files': 1, 'apis.': 1, 'mining': 1, 'mysql': 1, 'mongodb': 1, 'databases': 1, 'developing': 1, 'front-end': 1, 'visualizations': 1, 'bootstrap,': 1, 'd3,': 1, 'leaflet.js': 1, 'tableau': 1, 'business': 1, 'intelligence': 1, 'performing': 1, 'hadoop': 1, 'learning': 1, 'algorithms': 1, 'skills': 1, 'microsoft': 1, 'excel,': 1, 'python,': 1, 'html/css,': 1, 'api': 1, 'interactions,': 1, 'mining,': 1, 'sql,': 1, 'hadoop,'

In [33]:
# Comparing both word count solutions
print(word_count == word_counter)

True


In [34]:
# Top 10 Words
print("Top 10 Words")
print("=============")

Top 10 Words


In [37]:
# Clean Punctuation
_word_count = [word for word in word_count if word not in string.punctuation]


In [38]:
# Clean Stop Words
stop_words = ["and", "with", "using", "##", "working", "in", "to"]
_word_count = [word for word in _word_count if word not in stop_words]

In [39]:
# Sort words by count and print the top 10
sorted_words = []
for word in sorted(_word_count, key=word_count.get, reverse=True)[:10]:
    print(f"Token: {word:20} Count: {word_count[word]}")

Token: data                 Count: 7
Token: analytics            Count: 3
Token: python               Count: 3
Token: visualization        Count: 2
Token: scripts              Count: 2
Token: social               Count: 2
Token: media                Count: 2
Token: web                  Count: 2
Token: html,                Count: 2
Token: css,                 Count: 2


---
**UUID Generator**
# Instructions
* In this activity, you will generate a universally unique id (UUID) string using functions and module imports.
* See [link](https://stackoverflow.com/questions/292965/what-is-a-uuid) for more info on UUIDs.
* Import the [random](https://docs.python.org/3/library/random.html) and [string](https://docs.python.org/3/library/string.html) modules.
* Create a function that returns a universally unique id (UUID).
  * The function should accept a parameter for uuid length with the default size of 4.
  * The function should accept a parameter for a string of characters.
    * This string of characters will be the alphabet used to generate the uuid.
    * For example, if we pass `'abcdef'`, the uuid can only consist of the letters 'abcdef'.
  * The length and characters parameters should be optional and have default values.
  * Define a default character alphabet using the constants provided by the [string module](https://docs.python.org/3/library/string.html).
  * To select random characters for your uuid, use one of the functions available for sequence selection in the [random module](https://docs.python.org/3/library/random.html) to randomly select a character from the alphabet.
* Complete the test function to generate a variety of UUIDs and print them to the console.
## Hints
* Define a default character alphabet that combines ascii letters with digits.
* The random module has a function for making a random choice from an array. See the documentation on [functions for sequences](https://docs.python.org/3/library/random.html#functions-for-sequences).
* The code for the uuid function should create a list, append `length` random characters to the list, and then return the result of using `join` to create a string from it.


In [41]:
"""UUID Generator.

This module allows us to generate a universally unique identifier (UUID)
with a custom length and character set.

Example:
    $ python uuid.py

"""

# Use import to access code from other modules.
import string
import random


# Use default parameters in our function declaration to allow us to change the length and characters
def generate_uuid(length=4, characters=string.ascii_letters + string.digits):
    """Generate a string of random characters.

    Args:
        length (int, optional): The length of the UUID to generate.
        characters (string, optional): The character set used to build the UUID.

    Returns:
        string: A string representation of the generated UUID.
    """
    # Loop through a range defined by the length size
    # In each loop, make a random choice from our characters and append that to the uuid list
    uuid = []
    for _ in range(length):
        uuid.append(random.choice(characters))
    # Use join to convert the uuid list to a string
    return ''.join(uuid)


def test():
    """Run test code."""

    # Generate a uuid using default values
    uuid = generate_uuid()
    print("UUID using default values: {}".format(uuid))

    # Generate a uuid of length 16 using the default character set
    uuid16 = generate_uuid(length=16)
    print("UUID of length 16: {}".format(uuid16))

    # Generate a uuid of random numbers using the default length
    uuid_random_numbers = generate_uuid(characters=string.digits)
    print("UUID of only numbers: {}".format(uuid_random_numbers))

    # Generate a uuid consisting of only letters
    uuid_random_letters = generate_uuid(characters=string.ascii_letters)
    print("UUID of only letters: {}".format(uuid_random_letters))

    # Generate a uuid of length 8 that includes punctuation in the character set
    uuid_with_punctuation = generate_uuid(
        length=8,
        characters=string.ascii_letters + string.digits + string.punctuation)
    print("UUID with punctuation: {}".format(uuid_with_punctuation))


# This conditional will execute the test function when running as a script.
# https://docs.python.org/3/library/__main__.html
if __name__ == '__main__':
    test()


UUID using default values: lofK
UUID of length 16: kZMk9f95OGOXRZhg
UUID of only numbers: 3040
UUID of only letters: cjhx
UUID with punctuation: EX)"CT)?


# ===============================

### Rating Class Objectives

* rate your understanding using 1-5 method in each objective

In [None]:
title = "03-Python - Day 3 - Python Deeper Dive"
objectives = [
    "add, commit, and push code up to GitHub from the command line",
    "Create and use Python dictionaries",
    "read data in from a dictionary",
    "Use list comprehensions",
    "Write and re use Python function",
    "Have a firm understanding of coding logic and reasoning",
]
rating = []
total = 0
for i in range(len(objectives)):
    rate = input(objectives[i]+"? ")
    total += int(rate)
    rating.append(objectives[i] + ". (" + rate + "/5)")
print("="*96)
print(f"Self Evaluation for: {title}")
print("-"*24)
for i in rating:
    print(i)
print("-"*64)
print("Average: " + str(total/len(objectives)))