# Python Data Types

## [Lists](https://docs.python.org/3/tutorial/datastructures.html?highlight=list)

- Use [`.append()`](https://docs.python.org/3/library/array.html?highlight=append#array.array.append) to **add** individual data elements to a list.

- Use [`.extend()`](https://docs.python.org/3/library/array.html?highlight=append#array.array.extend) to **combine** a list with another array type (list, set, tuple).

- Use [`.index()`](https://docs.python.org/3/library/array.html?highlight=append#array.array.index) to **find** the position of an item in a list.

- Use [`.pop()`](https://docs.python.org/3/library/array.html?highlight=append#array.array.pop) to **remove** an item based on their index.

In [1]:
# Create a list containing the names: baby_names
baby_names = ['Ximena', 'Aliza', 'Ayden', 'Calvin']

# Extend baby_names with 'Rowen' and 'Sandeep'
baby_names.extend(['Rowen', 'Sandeep'])

# Print baby_names
print(baby_names)

# Find the position of 'Aliza': position
position = baby_names.index('Aliza')

# Remove 'Aliza' from baby_names
baby_names.pop(position)

# Print baby_names
print(baby_names)

['Ximena', 'Aliza', 'Ayden', 'Calvin', 'Rowen', 'Sandeep']
['Ximena', 'Ayden', 'Calvin', 'Rowen', 'Sandeep']


## Looping over lists
- Use a `for` loop to iterate through all the items in a list.
- Use [`sorted()` **function**](https://docs.python.org/3/library/functions.html#sorted) to sort the data in a list from lowest to highest (for numbers and strings). It **returns a new list** and it does not affect the list you passed into the function.

In [2]:
# List of list
records = [['2014', 'FEMALE', 'ASIAN AND PACIFIC ISLANDER', 'Aarya', '10', '40'], 
           ['2014', 'FEMALE', 'ASIAN AND PACIFIC ISLANDER', 'Abby', '27', '23'], 
           ['2014', 'FEMALE', 'ASIAN AND PACIFIC ISLANDER', 'Abigail', '31', '19'], 
           ['2014', 'FEMALE', 'ASIAN AND PACIFIC ISLANDER', 'Aisha', '18', '32'], 
           ['2014', 'FEMALE', 'ASIAN AND PACIFIC ISLANDER', 'Aiza', '19', '31'], 
           ['2014', 'FEMALE', 'ASIAN AND PACIFIC ISLANDER', 'Aleena', '17', '33']]

# Create the empty list: baby_names
baby_names = []

# Loop over records 
for row in records:
    # Add the name to the list
    baby_names.append(row[3])
    
# Sort the names in alphabetical order
for name in sorted(baby_names):
    # Print each name
    print(name)

Aarya
Abby
Abigail
Aisha
Aiza
Aleena


## [Tuples](https://docs.python.org/3/tutorial/datastructures.html?highlight=list#tuples-and-sequences)  

- **Several items that cannot be modified** in any way.  

- Used to represent data from a database.  

- Tuples can be "**unpacked**" into multipe variables such as `type, count = ('chocolate chip cookies', 15)` that will set `type` to `'chocolate chip cookies'` and `count` to `15`.  

- Use [`zip()` **function**](https://docs.python.org/3/library/functions.html#zip) to **pair up multiple array data types**. It returns a list of tuples containing one element from each list passed into zip().  

- Use [`enumerate()` **function**](https://docs.python.org/3/library/functions.html#enumerate) to **track your position in the list**. It returns the index of the list item you are currently on in the list and list item itself.  


In [3]:
girl_names = ['JADA', 'Emily', 'Ava', 'SERENITY', 'Claire', 'SOPHIA', 'Sarah']
boy_names = ['JOSIAH', 'ETHAN', 'Jayden', 'MASON', 'RYAN', 'CHRISTIAN']

# Pair up the girl and boy names: pairs
pairs = list(zip(girl_names, boy_names))

# Iterate over pairs
for idx, pair in enumerate(pairs):
    # Unpack pair: girl_name, boy_name
    girl_name, boy_name = pair
    # Print the rank and names associated with each rank
    print('Rank {}: {} and {}'.format(idx, girl_name, boy_name))

Rank 0: JADA and JOSIAH
Rank 1: Emily and ETHAN
Rank 2: Ava and Jayden
Rank 3: SERENITY and MASON
Rank 4: Claire and RYAN
Rank 5: SOPHIA and CHRISTIAN


## [Sets](https://docs.python.org/3/tutorial/datastructures.html?highlight=list#sets)  

Main characteristics:  
- Unique,  
- Unordered,  
- Mutable,  
- Python's implementation of Set Theory from Mathematics.  



- The `.union()` **method** returns a set of all the names found in the set you used the method on plus any sets passed as arguments to the method.
- The `.intersection()` **method** looks for overlapping data in sets. It will return an empty set if nothing matches.
- The `.difference()` **method** returns all the items found in one set but not another.
- Use `add()` to add items to a set. A set will only add items that do not exist in the set.


In [2]:
baby_names_2011 = {'Lillian', 'Leyla', 'Lilly', 'Allison', 
                   'Nikolas', 'Jaylin', 'Samiya', 'Gemma', 
                   'Hayden', 'Isabel', 'Emanuel', 'Katelynn', 
                   'Alana', 'Londyn', 'Reid', 'Cole', 
                   'Gabriel', 'Luca', 'Tiffany', 'Ezekiel'}

baby_names_2014 = {'Lillian', 'Leyla', 'Mae', 'Lilly', 
                   'Juniper', 'Allison', 'Jaylin', 'Monica', 
                   'Journee', 'Cormac', 'Trany', 'Gemma', 
                   'Hayden', 'Nashla', 'Isabel', 'Emanuel', 
                   'Phoenix', 'Alana', 'Londyn', 'Reid'}

# Find the union: all_names
all_names = baby_names_2011.union(baby_names_2014)

# Print the count of names in all_names
print(len(all_names))

# Find the intersection: overlapping_names
overlapping_names = baby_names_2011.intersection(baby_names_2014)

# Print the count of names in overlapping_names
print(len(overlapping_names))

28
12


In [18]:
baby_names_2014 = {'Yosef', 'Amira', 'Lauryn', 'Jayleen', 'Aydin', 'Finn', 'Michael', 'Uriel', 
                   'Adelaide', 'Sean', 'Valerie', 'Ibrahima', 'Montserrat', 'Guadalupe', 'Carina', 'Eddy', 
                   'Jesus', 'Joy', 'Ashton', 'Moses', 'Ezekiel', 'Brayden', 'Rocco', 'Arthur', 
                   'Kira', 'Ayleen', 'Alana', 'Bridget', 'Louisa', 'Selena', 'Aisha', 'Londyn', 
                   'Harper', 'Carmen', 'Raymond', 'Lev', 'Juliana', 'Alexa', 'Hadassah', 'Sergio', 
                   'Jimena', 'Cayden', 'Inaya', 'Nahla', 'Zion', 'Issac', 'Tzvi', 'Danna', 
                   'Annabella', 'Nikita', 'Kimi', 'Helena', 'Mason', 'Eden', 'Etty', 'Sophie', 
                   'Liv', 'Wolf', 'Kelly', 'Malachi', 'Fatou', 'Ahmed', 'Jeremy', 'Arlo', 
                   'Serenity', 'Gitty', 'Michaela', 'Ismael', 'Ashley', 'Emma', 'Kingsley', 'Adam', 
                   'Nathaly', 'Randy', 'Jazlyn', 'Yasmin', 'Jessica', 'Ali', 'Travis', 'Lailah', 
                   'Ian', 'Brian', 'Hersh', 'Yousef', 'Vanessa', 'Noor', 'Aniyah', 'Allyson', 
                   'Johnny', 'Brooklyn', 'Luka', 'Cora', 'Elisheva', 'Zoe', 'Alessia', 'Henry', 
                   'Hershy', 'Mindy', 'Sabrina', 'Jocelyn', 'Sylvia', 'Robert', 'Kylee', 'Iker', 
                   'Carolina', 'Avraham', 'Brandon', 'Kayden', 'Aileen', 'Elizabeth', 'Archer', 
                   'Yaseen', 'Syeda', 'Aaron', 'Lillian', 'Francis', 'Yidel', 'Salma', 'Reuben'}

records = [['2011', 'FEMALE', 'HISPANIC', 'Geraldine', '13', '75'], 
           ['2011', 'FEMALE', 'HISPANIC', 'Gia', '21', '67'], 
           ['2011', 'FEMALE', 'HISPANIC', 'Gianna', '49', '42'], 
           ['2011', 'FEMALE', 'HISPANIC', 'Giselle', '38', '51'], 
           ['2011', 'FEMALE', 'HISPANIC', 'Grace', '36', '53'], 
           ['2011', 'FEMALE', 'HISPANIC', 'Guadalupe', '26', '62'], 
           ['2011', 'FEMALE', 'HISPANIC', 'Hailey', '126', '8'], 
           ['2011', 'FEMALE', 'HISPANIC', 'Haley', '14', '74'], 
           ['2011', 'FEMALE', 'HISPANIC', 'Hannah', '17', '71'], 
           ['2011', 'FEMALE', 'HISPANIC', 'Haylee', '17', '71'], 
           ['2011', 'FEMALE', 'HISPANIC', 'Hayley', '13', '75'], 
           ['2011', 'FEMALE', 'HISPANIC', 'Hazel', '10', '78'], 
           ['2011', 'FEMALE', 'HISPANIC', 'Heaven', '15', '73'], 
           ['2011', 'FEMALE', 'HISPANIC', 'Heidi', '15', '73'], 
           ['2011', 'FEMALE', 'HISPANIC', 'Heidy', '16', '72'], 
           ['2011', 'FEMALE', 'HISPANIC', 'Helen', '13', '75'], 
           ['2011', 'FEMALE', 'HISPANIC', 'Imani', '11', '77'], 
           ['2011', 'FEMALE', 'HISPANIC', 'Ingrid', '11', '77'], 
           ['2011', 'FEMALE', 'HISPANIC', 'Irene', '11', '77'], 
           ['2011', 'FEMALE', 'HISPANIC', 'Iris', '10', '78'], 
           ['2011', 'FEMALE', 'HISPANIC', 'Isabel', '28', '60'], 
           ['2011', 'FEMALE', 'HISPANIC', 'Isabela', '10', '78'], 
           ['2011', 'FEMALE', 'HISPANIC', 'Isabella', '331', '1'], 
           ['2011', 'FEMALE', 'HISPANIC', 'Isabelle', '18', '70'], 
           ['2011', 'FEMALE', 'HISPANIC', 'Isis', '13', '75'], 
           ['2011', 'FEMALE', 'HISPANIC', 'Itzel', '27', '61'], 
           ['2011', 'FEMALE', 'HISPANIC', 'Izabella', '23', '65'], 
           ['2011', 'FEMALE', 'HISPANIC', 'Jacqueline', '30', '58'], 
           ['2011', 'FEMALE', 'HISPANIC', 'Jada', '21', '67'], 
           ['2011', 'FEMALE', 'HISPANIC', 'Jade', '50', '41'], 
           ['2011', 'FEMALE', 'HISPANIC', 'Jaelynn', '11', '77'], 
           ['2011', 'FEMALE', 'HISPANIC', 'Jamie', '11', '77'], 
           ['2011', 'FEMALE', 'HISPANIC', 'Janelle', '12', '76'], 
           ['2011', 'FEMALE', 'HISPANIC', 'Jaslene', '11', '77'], 
           ['2011', 'FEMALE', 'HISPANIC', 'Jasmin', '20', '68'], 
           ['2011', 'FEMALE', 'HISPANIC', 'Jasmine', '41', '48'], 
           ['2011', 'FEMALE', 'HISPANIC', 'Jayda', '10', '78'], 
           ['2011', 'FEMALE', 'HISPANIC', 'Jayla', '33', '55'], 
           ['2011', 'FEMALE', 'HISPANIC', 'Jaylah', '12', '76'], 
           ['2011', 'FEMALE', 'HISPANIC', 'Jayleen', '51', '40'], 
           ['2011', 'FEMALE', 'HISPANIC', 'Jaylene', '22', '66']]

# Create the empty set: baby_names_2011
baby_names_2011 = set()

# Loop over records and add the names from 2011 to the baby_names_2011 set
for row in records:
    # Check if the first column is '2011'
    if row[0] == '2011':
        # Add the fourth column to the set
        baby_names_2011.add(row[3])

# Find the difference between 2011 and 2014: differences
differences = baby_names_2011.difference(baby_names_2014)

# Print the differences
print(differences)

{'Isabela', 'Geraldine', 'Heidy', 'Ingrid', 'Hailey', 'Heaven', 'Izabella', 'Isabel', 'Iris', 'Hannah', 'Jade', 'Hayley', 'Jada', 'Jayda', 'Isis', 'Jayla', 'Gia', 'Irene', 'Jacqueline', 'Giselle', 'Helen', 'Gianna', 'Jasmine', 'Grace', 'Janelle', 'Jaylah', 'Heidi', 'Hazel', 'Haley', 'Jaelynn', 'Imani', 'Isabella', 'Jasmin', 'Itzel', 'Jaslene', 'Jamie', 'Haylee', 'Jaylene', 'Isabelle'}


## [Dictionaries](https://docs.python.org/3/tutorial/datastructures.html?highlight=list#dictionaries)  
  
Main characteristics:  
- Dictionaries are **indexed by keys**, which can be any **immutable type**; **strings and numbers** can always be keys.
- **Tuples** can be used as **keys** if they contain **only strings, numbers, or tuples**.  
- You **can’t** use **lists as keys**, since lists can be modified in place using index assignments, slice assignments, or methods like `append()` and `extend()`.
- Created a dictionary with `dict()`or `{}`.  
- Dictionaries help to use key names to make reading the code and accessing the data easier to understand.  


- Use `.get(key_name, 'optional_message')` **method** to access a key without error or exception handling.   
- Use `.key()` **method** to show the keys for a given dictionary.  
- Use [`sorted()` **function**](https://docs.python.org/3/library/functions.html#sorted) to organize data by sorting they keys of the dictionary. Reverse the order by passing `reverse = True` as keyword argument.  

In [2]:
# Create a dictionary with female baby names by rank
female_baby_names_2012 = {1: 'EMMA', 2: 'LEAH', 3: 'SARAH', 4: 'SOPHIA',
                     5: 'ESTHER', 6: 'RACHEL', 7: 'CHAYA', 8: 'AVA',
                     9: 'CHANA', 10: 'MIRIAM', 11: 'ELLA', 12: 'EMILY',
                     13: 'MIA', 14: 'SARA', 15: 'CHARLOTTE', 16: 'ISABELLA',
                     17: 'MAYA', 18: 'ELIZABETH', 19: 'ABIGAIL', 20: 'ALEXANDRA'}

# Create an empty dictionary: names_by_rank
names_by_rank = dict()

# Loop over the girl names
for rank, name in female_baby_names_2012.items():
    # Add each name to the names_by_rank dictionary using rank as the key
    names_by_rank[rank] = name
    
# Sort the names_by_rank dict by rank in descending order and slice the first 10 items
for rank in sorted(names_by_rank, reverse = True)[:10]:
    # Print each item
    print(names_by_rank[rank])

ALEXANDRA
ABIGAIL
ELIZABETH
MAYA
ISABELLA
CHARLOTTE
SARA
MIA
EMILY
ELLA


In [3]:
names = {1: 'EMMA', 2: 'LEAH', 3: 'SARAH', 4: 'SOPHIA',
                     5: 'ESTHER', 6: 'RACHEL', 7: 'CHAYA', 8: 'AVA',
                     9: 'CHANA', 10: 'MIRIAM', 11: 'ELLA', 12: 'EMILY',
                     13: 'MIA', 14: 'SARA', 15: 'CHARLOTTE', 16: 'ISABELLA',
                     17: 'MAYA', 18: 'ELIZABETH', 19: 'ABIGAIL', 20: 'ALEXANDRA'}

# Safely print rank 7 from the names dictionary
print(names.get(7))

# Safely print the type of rank 100 from the names dictionary
print(type(names.get(100)))

# Safely print rank 105 from the names dictionary or 'Not Found'
print((names.get(105, 'Not Found')))

CHAYA
<class 'NoneType'>
Not Found


### Nested Dictionaries  

- Dictionaries can contain another dictionary **as the value of a key**,  
- It is a very common way to deal with **repeating data structures** such as yearly, monthly or weekly data,  
- `dictonary_name[first_key_name][second_key_name]`.  

**NOTE**: nested dictionaries can be accessed using multiple indices or the `.get()` method.  

In [1]:
boy_names = {2012: {},
             2013: {1: 'David', 2: 'Joseph', 3: 'Michael', 4: 'Moshe', 5: 'Daniel',
                    6: 'Benjamin', 7: 'James', 8: 'Jacob', 9: 'Jack', 10: 'Alexander'},
             2014: {1: 'Joseph', 2: 'David', 3: 'Michael', 4: 'Moshe', 5: 'Jacob',
                    6: 'Benjamin', 7: 'Alexander', 8: 'Daniel', 9: 'Samuel', 10: 'Jack'}}

# Print a list of keys from the boy_names dictionary
print(boy_names.keys())

# Print a list of keys from the boy_names dictionary for the year 2013
print(boy_names[2013].keys())

# Loop over the dictionary
for year in boy_names:
    # Safely print the year and the third ranked name or 'Unknown'
    print(year, boy_names[year].get(3, 'Unknown'))

dict_keys([2012, 2013, 2014])
dict_keys([1, 2, 3, 4, 5, 6, 7, 8, 9, 10])
2012 Unknown
2013 Michael
2014 Michael


### Adding and extending dictionaries  

- If the dictionary is nested, then all the keys in the data path must exist, and each key in the path must be assigned individually.  
- Use the [`update()`**method**](https://docs.python.org/3/library/stdtypes.html?highlight=update#dict.update) to update a dictionary with keys and values from another dictionary, tuples or keyword arguments.  


In [3]:
boy_names = {2012: {}, 
             2013: {1: 'David', 2: 'Joseph', 3: 'Michael', 4: 'Moshe', 5: 'Daniel',
                    6: 'Benjamin', 7: 'James', 8: 'Jacob', 9: 'Jack', 10: 'Alexander'},
             2014: {1: 'Joseph', 2: 'David', 3: 'Michael', 4: 'Moshe', 5: 'Jacob',
                    6: 'Benjamin', 7: 'Alexander', 8: 'Daniel', 9: 'Samuel', 10: 'Jack'}}

names_2011 = {1: 'Michael', 2: 'Joseph', 3: 'Jacob', 4: 'David', 5: 'Benjamin', 
              6: 'Moshe', 7: 'Daniel', 8: 'Alexander', 9: 'Matthew', 10: 'Jack'}

# Assign the names_2011 dictionary as the value to the 2011 key of boy_names
boy_names[2011] = names_2011

# Update the 2012 key in the boy_names dictionary
boy_names[2012].update([(1, 'Casey'), (2, 'Aiden')])

# Loop over the years in the boy_names dictionary 
for year in boy_names:
    # Sort the data for each year by descending rank and get the lowest one
    lowest_ranked =  sorted(boy_names[year], reverse=True)[0]
    # Safely print the year and the least popular name or 'Not Available'
    print(year, boy_names[year].get(lowest_ranked, 'Not Available'))

2012 Aiden
2013 Alexander
2014 Jack
2011 Jack


### Popping and deleting dictionaries  
- Use `del` to remove keys and values from a dictinary. It trhows a `KeyEerror` if the key does not exist.
- Use `pop()` method to delete data into another variable for further processing. It helps to safely deal with missing keys.  

In [1]:
female_names = {2011: {1: 'Olivia', 2: 'Esther', 3: 'Rachel', 4: 'Leah', 5: 'Emma',
                       6: 'Chaya', 7: 'Sarah', 8: 'Sophia', 9: 'Ava', 10: 'Miriam'},
                2012: {},
                2013: {1: 'Olivia', 2: 'Emma', 3: 'Esther', 4: 'Sophia', 5: 'Sarah',
                       6: 'Leah', 7: 'Rachel', 8: 'Chaya', 9: 'Miriam', 10: 'Chana'},
                2014: {1: 'Olivia', 2: 'Esther', 3: 'Rachel', 4: 'Leah', 5: 'Emma',
                       6: 'Chaya', 7: 'Sarah', 8: 'Sophia', 9: 'Ava', 10: 'Miriam'}}

# Remove 2011 from female_names and store it: female_names_2011
female_names_2011 = female_names.pop(2011)

# Safely remove 2015 from female_names with an empty dictionary as the default: female_names_2015
female_names_2015 = female_names.pop(2015,{})

# Delete 2012 from female_names
del(female_names[2012])

# Print female_names
print(female_names)

{2013: {1: 'Olivia', 2: 'Emma', 3: 'Esther', 4: 'Sophia', 5: 'Sarah', 6: 'Leah', 7: 'Rachel', 8: 'Chaya', 9: 'Miriam', 10: 'Chana'}, 2014: {1: 'Olivia', 2: 'Esther', 3: 'Rachel', 4: 'Leah', 5: 'Emma', 6: 'Chaya', 7: 'Sarah', 8: 'Sophia', 9: 'Ava', 10: 'Miriam'}}


In [3]:
baby_names = {2012: {},
              2013: {1: 'David', 2: 'Joseph', 3: 'Michael', 4: 'Moshe', 5: 'Daniel',
                     6: 'Benjamin', 7: 'James', 8: 'Jacob', 9: 'Jack', 10: 'Alexander'},
              2014: {1: 'Joseph', 2: 'David', 3: 'Michael', 4: 'Moshe', 5: 'Jacob',
                     6: 'Benjamin', 7: 'Alexander', 8: 'Daniel', 9: 'Samuel', 10: 'Jack'}}

# Iterate over the 2014 nested dictionary
for rank, name in baby_names[2014].items():
    # Print rank and name
    print(rank, name)
    
# Iterate over the 2012 nested dictionary
for rank, name in baby_names[2012].items():
    # Print rank and name
    print(rank, name)

1 Joseph
2 David
3 Michael
4 Moshe
5 Jacob
6 Benjamin
7 Alexander
8 Daniel
9 Samuel
10 Jack


**NOTE**: Using the `.items()` **method** to iterate over dictionaries is something you'll be doing very frequently in Python.

In [2]:
baby_names = {2012: {},
              2013: {1: 'David', 2: 'Joseph', 3: 'Michael', 4: 'Moshe', 5: 'Daniel',
                     6: 'Benjamin', 7: 'James', 8: 'Jacob', 9: 'Jack', 10: 'Alexander'},
              2014: {1: 'Joseph', 2: 'David', 3: 'Michael', 4: 'Moshe', 5: 'Jacob',
                     6: 'Benjamin', 7: 'Alexander', 8: 'Daniel', 9: 'Samuel', 10: 'Jack'}}

# Check to see if 2011 is in baby_names
if 2011 in baby_names:
    # Print 'Found 2011'
    print('Found 2011')
    
# Check to see if rank 1 is in 2012
if 1 in baby_names[2012]:
    # Print 'Found Rank 1 in 2012' if found
    print('Found Rank 1 in 2012')
else:
    # Print 'Rank 1 missing from 2012' if not found
    print('Rank 1 missing from 2012')
    
# Check to see if Rank 5 is in 2013
if 5 in baby_names[2013]:
   # Print 'Found Rank 5'
   print('Found Rank 5')

Rank 1 missing from 2012
Found Rank 5


## [CSV Reader](https://docs.python.org/3/library/csv.html#module-csv)  

- Use the `open()` **function** to create a Python file object, which accepts a **file name and a mode**. The mode is typically `'r'` for **read** or `'w'` for **write**.  

- [`csv.reader()`](https://docs.python.org/3/library/csv.html#csv.reader) **method** of `csv` reads a Python file object and returns the lines from the file as **tuples**. Use it as you would any other **iterable**.  

- CSV files may have a header row with field names, use slice notation such as `[1:]` to **skip the header row**.  

- `.close()` **method** closes file objects.  

In [1]:
# Import the python CSV module
import csv

# Set the path of the csv file
file_path = '/Users/luismoreno/Documents/R/datacamp/career tracks/python programmer/baby_names.csv'

# Create a empty dictionary
baby_names = {}

# Create a python file object in read mode for the baby_names.csv file: csvfile
csvfile = open(file_path, 'r')

# Loop over a csv reader on the file object
for row in csv.reader(csvfile):
    # Print each row 
    print(row)
    # Add the rank and name to the dictionary
    baby_names[row[5]] = row[3]

# Print the dictionary keys
print(baby_names.keys())

['BRITH_YEAR', 'GENDER', 'ETHNICTY', 'NAME', 'COUNT', 'RANK']
['2011', 'FEMALE', 'ASIAN AND PACIFIC ISLANDER', 'GERALDINE', '13', '75']
['2012', 'FEMALE', 'BLACK NON HISPANIC', 'GIA', '21', '67']
['2011', 'FEMALE', 'WHITE NON HISP', 'GIANNA', '49', '42']
['2012', 'FEMALE', 'WHITE NON HISPANIC', 'GISELLE', '38', '51']
['2013', 'FEMALE', 'ASIAN AND PACIFIC ISLANDER', 'GRACE', '36', '53']
['2014', 'FEMALE', 'BLACK NON HISPANIC', 'GUADALUPE', '26', '62']
['2011', 'FEMALE', 'ASIAN AND PACI', 'HAILEY', '126', '8']
['2011', 'FEMALE', 'BLACK NON HISP', 'HALEY', '14', '74']
['2012', 'FEMALE', 'WHITE NON HISP', 'HANNAH', '17', '71']
['2013', 'FEMALE', 'WHITE NON HISPANIC', 'HAYLEE', '17', '71']
['2014', 'FEMALE', 'ASIAN AND PACIFIC ISLANDER', 'HAYLEY', '13', '75']
['2012', 'FEMALE', 'ASIAN AND PACI', 'HAZEL', '10', '78']
['2013', 'FEMALE', 'BLACK NON HISP', 'HEAVEN', '15', '73']
['2014', 'FEMALE', 'HISPANIC', 'HEIDI', '15', '73']
dict_keys(['RANK', '75', '67', '42', '51', '53', '62', '8', '74', 

In [4]:
import os
print(os.getcwd())

/Users/luismoreno/Documents/R/datacamp/career tracks/python programmer


### [DictReader](https://docs.python.org/3/library/csv.html#csv.DictReader)  
- Use `csv.DictReader` to directly create a dictionary from a CSV file.  
- If the file has a **header row**, it will automatically be used as the keys for the dictionary. If not, a list of keys cabe supplied. 

In [6]:
# Import the python CSV module
import csv

# Set the path of the csv file
file_path = '/Users/luismoreno/Documents/R/datacamp/career tracks/python programmer/baby_names.csv'

# Create a empty dictionary
baby_names = {}

# Create a python file object in read mode for the `baby_names.csv` file: csvfile
csvfile = open('baby_names.csv', 'r')

# Loop over a DictReader on the file
for row in csv.DictReader(csvfile):
    # Print each row 
    print(row)
    # Add the rank and name to the dictionary: baby_names
    baby_names[row['RANK']] = row['NAME']

# Print the dictionary keys
print(baby_names.keys())

{'BRITH_YEAR': '2011', 'GENDER': 'FEMALE', 'ETHNICTY': 'ASIAN AND PACIFIC ISLANDER', 'NAME': 'GERALDINE', 'COUNT': '13', 'RANK': '75'}
{'BRITH_YEAR': '2012', 'GENDER': 'FEMALE', 'ETHNICTY': 'BLACK NON HISPANIC', 'NAME': 'GIA', 'COUNT': '21', 'RANK': '67'}
{'BRITH_YEAR': '2011', 'GENDER': 'FEMALE', 'ETHNICTY': 'WHITE NON HISP', 'NAME': 'GIANNA', 'COUNT': '49', 'RANK': '42'}
{'BRITH_YEAR': '2012', 'GENDER': 'FEMALE', 'ETHNICTY': 'WHITE NON HISPANIC', 'NAME': 'GISELLE', 'COUNT': '38', 'RANK': '51'}
{'BRITH_YEAR': '2013', 'GENDER': 'FEMALE', 'ETHNICTY': 'ASIAN AND PACIFIC ISLANDER', 'NAME': 'GRACE', 'COUNT': '36', 'RANK': '53'}
{'BRITH_YEAR': '2014', 'GENDER': 'FEMALE', 'ETHNICTY': 'BLACK NON HISPANIC', 'NAME': 'GUADALUPE', 'COUNT': '26', 'RANK': '62'}
{'BRITH_YEAR': '2011', 'GENDER': 'FEMALE', 'ETHNICTY': 'ASIAN AND PACI', 'NAME': 'HAILEY', 'COUNT': '126', 'RANK': '8'}
{'BRITH_YEAR': '2011', 'GENDER': 'FEMALE', 'ETHNICTY': 'BLACK NON HISP', 'NAME': 'HALEY', 'COUNT': '14', 'RANK': '74'}
{'

## [Collections Module](https://docs.python.org/3/library/collections.html)  

- Implements **specialized container** datatypes providing alternatives to Python’s general purpose built-in containers, `dict`, `list`, `set`, and `tuple`.  

### [Counter](https://docs.python.org/3/library/collections.html#collections.Counter)  

- You can pass an iterable (list, set, tuple, dictionary) to the `Counter`.
- Is a `dict` subclass for **counting** hashable objects.  
- Is a collecetion where **elements** are stored as **dictionary keys** and their **counts** are stored as **dictionary values**.  
- [`.most_common()`](https://docs.python.org/3/library/collections.html#collections.Counter.most_common) **method** returns the counter values in **descending order**.

In [3]:
stations = ['stationname', 'Austin-Forest Park', 'Austin-Forest Park', 'Austin-Forest Park',
           'Pulaski-Orange', 'Pulaski-Orange', 'Washington/Wells', 'Washington/Wells',
           'Damen-Brown', 'Damen-Brown', 'Chicago/State', 'Chicago/State', 'Chicago/State']

# Import the Counter object
from collections import Counter

# Print the first ten items from the stations list
print(stations[:10])

# Create a Counter of the stations list: station_count
station_count = Counter(stations)

# Print the station_count
print(station_count)

# Find the 3 most common elements
print(station_count.most_common(3))

['stationname', 'Austin-Forest Park', 'Austin-Forest Park', 'Austin-Forest Park', 'Pulaski-Orange', 'Pulaski-Orange', 'Washington/Wells', 'Washington/Wells', 'Damen-Brown', 'Damen-Brown']
Counter({'Austin-Forest Park': 3, 'Chicago/State': 3, 'Pulaski-Orange': 2, 'Washington/Wells': 2, 'Damen-Brown': 2, 'stationname': 1})
[('Austin-Forest Park', 3), ('Chicago/State', 3), ('Pulaski-Orange', 2)]


### [Defaultdict](https://docs.python.org/3/library/collections.html#collections.defaultdict)  

- Use this when you need a structure to hold nested data, and you may not be certain that the keys will all actually exist.  
- When working with dictionaries, you will need to initialize a data type before using it. For example, a list has to be initialized on each key before you can append to that list.  
- A `defaultdict` allows you to define what each uninitialized key will contain.  
- You must pass it the **type** you want it to be, such as a `list`, `tuple`, `set`, `int`, `string`, `dictionary` or any other valid type object.  


In [4]:
entries = [('01/01/2015', 'Austin-Forest Park', '587'),
           ('01/02/2015', 'Austin-Forest Park', '1386'),
           ('01/03/2015', 'Austin-Forest Park', '785'),
           ('01/04/2015', 'Austin-Forest Park', '625'),
           ('01/05/2015', 'Austin-Forest Park', '1752')]

# Create an empty dictionary: ridership
ridership = {}

# Iterate over the entries
for date, stop, riders in entries:
    # Check to see if date is already in the ridership dictionary
    if date not in ridership:
        # Create an empty list for any missing date
        ridership[date] = []
    # Append the stop and riders as a tuple to the date keys list
    ridership[date].append((stop,riders))
    
# Print the ridership for '03/09/2016'
print(ridership.get('03/09/2016', 'Key Not Available'))

Key Not Available


In [None]:
# Import defaultdict
from collections import defaultdict

# Create a defaultdict with a default type of list: ridership
ridership = defaultdict(list)

# Iterate over the entries
for date, stop, riders in entries:
    # Use the stop as the key of ridership and append the riders to its value
    ridership[stop].append(riders)
    
# Print the first 10 items of the ridership dictionary
print(list(ridership.items())[:10])

### [OrderedDict](https://docs.python.org/3/library/collections.html#collections.OrderedDict)   

-  In Python 3.6, dictionaries were made to maintain the order in which the keys were inserted; however, in all versions prior to that you need to use an OrderedDict to maintain insertion order.

- Return an instance of a `dict` subclass that has methods specialized for **rearranging dictionary order**.  

- [`.popitem()`](https://docs.python.org/3/library/collections.html#collections.OrderedDict.popitem) **method** returns items in reverse insertion order.  

- Use `last=False` keyword argument to return the items in insertion order.  

In [None]:
# Import OrderedDict from collections
from collections import OrderedDict

# Create an OrderedDict called: ridership_date
ridership_date = OrderedDict()

# Iterate over the entries
for date, riders in entries:
    # If a key does not exist in ridership_date, set it to 0
    if  date not in ridership_date:
        ridership_date[date] = 0
        
    # Add riders to the date key in ridership_date
    ridership_date[date] += riders
    
# Print the first 31 records
print(list(ridership_date.items())[:31])

In [2]:
# Import OrderedDict from collections
from collections import OrderedDict

ridership_date = OrderedDict([('01/01/2015', 233956), ('01/02/2015', 432144),
                              ('01/03/2015', 273207), ('01/04/2015', 217632),
                              ('01/05/2015', 538868), ('01/06/2015', 556918),
                              ('01/07/2015', 416984)])

# Print the first key in ridership_date
print(list(ridership_date.keys())[0])

# Pop the first item from ridership_date and print it
print(list(ridership_date.items())[0])

# Print the last key in ridership_date
print(list(ridership_date.keys())[-1])

# Pop the last item from ridership_date and print it
print(ridership_date.popitem())

01/01/2015
('01/01/2015', 233956)
01/07/2015
('01/07/2015', 416984)


### [namedtuple](https://docs.python.org/3/library/collections.html#collections.namedtuple)  

- A tuple where each position (column) has a name.  
- Returns a new tuple subclass named **typename**.  
- It is an alternative to `pandas` DataFrame row.  
- Capitalizing each word when naming `namedtuples` is a common practice.  
- Each field is available as an attribute of the namedtuple.

In [2]:
entries = [('01/01/2015', 'Austin-Forest Park', '587'),
           ('01/02/2015', 'Austin-Forest Park', '1386'),
           ('01/03/2015', 'Austin-Forest Park', '785'),
           ('01/04/2015', 'Austin-Forest Park', '625'),
           ('01/05/2015', 'Austin-Forest Park', '1752'),
           ('01/06/2015', 'Austin-Forest Park', '1777'),
           ('01/07/2015', 'Austin-Forest Park', '1269'),
           ('01/08/2015', 'Austin-Forest Park', '1435'),
           ('01/09/2015', 'Austin-Forest Park', '1631'),
           ('01/10/2015', 'Austin-Forest Park', '771')]

# Import namedtuple from collections
from collections import namedtuple

# Create the namedtuple: DateDetails
DateDetails = namedtuple('DateDetails', ['date', 'stop', 'riders'])

# Create the empty list: labeled_entries
labeled_entries = []

# Iterate over the entries list
for date, stop, riders in entries:
    # Append a new DateDetails namedtuple instance for each entry to labeled_entries
    details = DateDetails(date, stop, riders)
    labeled_entries.append(details)
    
# Print the first 5 items in labeled_entries
print(labeled_entries[:5])

# Iterate over the first five items in labeled_entries
for item in labeled_entries[:5]:
    # Print each item's stop
    print(item.stop)

    # Print each item's date
    print(item.date)

    # Print each item's riders
    print(item.riders)

[DateDetails(date='01/01/2015', stop='Austin-Forest Park', riders='587'), DateDetails(date='01/02/2015', stop='Austin-Forest Park', riders='1386'), DateDetails(date='01/03/2015', stop='Austin-Forest Park', riders='785'), DateDetails(date='01/04/2015', stop='Austin-Forest Park', riders='625'), DateDetails(date='01/05/2015', stop='Austin-Forest Park', riders='1752')]
Austin-Forest Park
01/01/2015
587
Austin-Forest Park
01/02/2015
1386
Austin-Forest Park
01/03/2015
785
Austin-Forest Park
01/04/2015
625
Austin-Forest Park
01/05/2015
1752


## [datetime](https://docs.python.org/3/library/datetime.html#module-datetime)  

- Use [**`.strptime()` method**](https://docs.python.org/3/library/datetime.html#datetime.datetime.strptime) to convert from a string object to datetime. A [**format**](https://docs.python.org/3/library/datetime.html#strftime-strptime-behavior) must be specified.  
- Use [**`.strftime()` method**](https://docs.python.org/3/library/datetime.html#datetime.date.strftime) to convert from a datime object to a string. You must pass a format string.  


In [1]:
# Import the datetime object from datetime
from datetime import datetime

dates_list = ['02/19/2001', '04/10/2001', '05/30/2001', '07/19/2001', '09/07/2001',
              '10/27/2001', '12/16/2001', '02/04/2002', '03/26/2002', '05/15/2002']

# Iterate over the dates_list 
for date in dates_list:
    # Convert each date to a datetime object: date_dt
    date_dt = datetime.strptime(date, '%m/%d/%Y')
    
    # Print each date_dt
    print(date_dt)

2001-02-19 00:00:00
2001-04-10 00:00:00
2001-05-30 00:00:00
2001-07-19 00:00:00
2001-09-07 00:00:00
2001-10-27 00:00:00
2001-12-16 00:00:00
2002-02-04 00:00:00
2002-03-26 00:00:00
2002-05-15 00:00:00


In [1]:
# Import the datetime object from datetime
from datetime import datetime

datetimes_list = [datetime(2001, 2, 19, 0, 0), datetime(2001, 4, 10, 0, 0),
                  datetime(2001, 5, 30, 0, 0), datetime(2001, 7, 19, 0, 0),
                  datetime(2001, 9, 7, 0, 0), datetime(2001, 10, 27, 0, 0),
                  datetime(2001, 12, 16, 0, 0), datetime(2002, 2, 4, 0, 0),
                  datetime(2002, 3, 26, 0, 0), datetime(2002, 5, 15, 0, 0)]

# Loop over the first 5 items of the datetimes_list
for item in datetimes_list[:5]:
    # Print out the record as a string in the format of 'MM/DD/YYYY'
    print(item.strftime('%m/%d/%Y'))
    
    # Print out the record as an ISO standard string
    print(item.isoformat())

02/19/2001
2001-02-19T00:00:00
04/10/2001
2001-04-10T00:00:00
05/30/2001
2001-05-30T00:00:00
07/19/2001
2001-07-19T00:00:00
09/07/2001
2001-09-07T00:00:00


In [4]:
# Import required modules
from collections import defaultdict
from datetime import datetime

daily_summaries = [('01/01/2001', 'U', '297192', '126455', '423647'),
                   ('01/02/2001', 'W', '780827', '501952', '1282779'),
                   ('01/03/2001', 'W', '824923', '536432', '1361355'),
                   ('01/04/2001', 'W', '870021', '550011', '1420032'),
                   ('01/05/2001', 'W', '890426', '557917', '1448343'),
                   ('01/06/2001', 'A', '577401', '255356', '832757'),
                   ('01/07/2001', 'U', '375831', '169825', '545656'),
                   ('01/08/2001', 'W', '985221', '590706', '1575927'),
                   ('01/09/2001', 'W', '978377', '599905', '1578282'),
                   ('01/10/2001', 'W', '984884', '602052', '1586936')]

# Create a defaultdict of an integer: monthly_total_rides
monthly_total_rides = defaultdict(int)

# Loop over the list daily_summaries
for daily_summary in daily_summaries:
    # Convert the service_date to a datetime object
    service_datetime = datetime.strptime(daily_summary[0], '%m/%d/%Y')

    # Add the total rides to the current amount for the month
    monthly_total_rides[service_datetime.month] += int(daily_summary[4])
    
# Print monthly_total_rides
print(monthly_total_rides)

defaultdict(<class 'int'>, {1: 12055714})


- Use `datetime` now functions to work on windows or ranges that start from the current date and time.  
- The [**`.now()` method**](https://docs.python.org/3/library/datetime.html#datetime.datetime.now) on the `datetime` object returns the current local date and time.  
- The [**`.utcnow()` method**](https://docs.python.org/3/library/datetime.html#datetime.datetime.utcnow) on the `datetime` object returns the current UTC date and time.  

In [5]:
# Import datetime from the datetime module
from datetime import datetime

# Compute the local datetime: local_dt
local_dt = datetime.now()

# Print the local datetime
print(local_dt)

# Compute the UTC datetime: utc_dt
utc_dt = datetime.utcnow()

# Print the UTC datetime
print(utc_dt)

2021-07-20 20:33:33.148132
2021-07-21 01:33:33.148423


- Import the `timezone` object from the `pytz` module. Then use the `timezone` constructor and pass it a name of a timezone.  
- A full list of timezone names can be retrieved at [Wikipedia](https://en.wikipedia.org/wiki/List_of_tz_database_time_zones).  
- Make a `datetime` object **"aware"** by passing a timezone as the `tzinfo` keyword argument to the `.replace()` method on a datetime instance.  
- An **"aware"** datetime object has an `.astimezone()` method that accepts a `timezone` object and returns a new datetime object in the desired timezone. If the tzinfo is not set for the datetime object it assumes the timezone of the computer you are working on.  

In [5]:
# Import required modules
from datetime import datetime
from pytz import timezone

daily_summaries = [(datetime(2001, 1, 1, 7, 8), '126455'),
                   (datetime(2001, 1, 2, 22, 56), '501952'),
                   (datetime(2001, 1, 3, 11, 43), '536432'),
                   (datetime(2001, 1, 4, 6, 49), '550011'),
                   (datetime(2001, 1, 5, 22, 54), '557917')]

# Create a Timezone object for Chicago
chicago_usa_tz = timezone('US/Central')

# Create a Timezone object for New York
ny_usa_tz = timezone('US/Eastern')

# Iterate over the daily_summaries list
for orig_dt, ridership in daily_summaries:

    # Make the orig_dt timezone "aware" for Chicago
    chicago_dt = orig_dt.replace(tzinfo=chicago_usa_tz)
    
    # Convert chicago_dt to the New York Timezone
    ny_dt = chicago_dt.astimezone(ny_usa_tz)
    
    # Print the chicago_dt, ny_dt, and ridership
    print('Chicago: %s, NY: %s, Ridership: %s' % (chicago_dt, ny_dt, ridership))

Chicago: 2001-01-01 07:08:00-05:51, NY: 2001-01-01 07:59:00-05:00, Ridership: 126455
Chicago: 2001-01-02 22:56:00-05:51, NY: 2001-01-02 23:47:00-05:00, Ridership: 501952
Chicago: 2001-01-03 11:43:00-05:51, NY: 2001-01-03 12:34:00-05:00, Ridership: 536432
Chicago: 2001-01-04 06:49:00-05:51, NY: 2001-01-04 07:40:00-05:00, Ridership: 550011
Chicago: 2001-01-05 22:54:00-05:51, NY: 2001-01-05 23:45:00-05:00, Ridership: 557917


### [timedelta](https://docs.python.org/3/library/datetime.html#timedelta-objects)  

- The `timedelta` object from the `datetime` module is used to represent differences in `datetime` objects.  
- Create a `timedelta` by passing any number of **keyword arguments** such as **days**, **seconds**, **microseconds**, **milliseconds**, **minutes**, **hours**, and **weeks** to [**`timedelta()`**](https://docs.python.org/3/library/datetime.html#datetime.timedelta).  
- When you have a `timedelta` object, you can add or subtract it from a `datetime` object to get a `datetime` object relative to the original `datetime` object.  

In [None]:
# Import timedelta from the datetime module
from datetime import timedelta

# Build a timedelta of 30 days: glanceback
glanceback = timedelta(days=30)

# Iterate over the review_dates as date
for date in review_dates:
    # Calculate the date 30 days back: prior_period_dt
    prior_period_dt = date - glanceback
    
    # Print the review_date, day_type and total_ridership
    print('Date: %s, Type: %s, Total Ridership: %s' %
         (date, 
          daily_summaries_dt[daily_summaries_dt][1], 
          daily_summaries_dt[daily_summaries_dt][2]))

    # Print the prior_period_dt, day_type and total_ridership
    print('Date: %s, Type: %s, Total Ridership: %s' %
         (prior_period_dt, 
          daily_summaries_dt[daily_summaries_dt][1], 
          daily_summaries_dt[daily_summaries_dt][2]))

# Reading Data with DictReader  

In [None]:
# Create the CSV file: csvfile
csvfile = open('crime_sampler.csv', 'r')

# Create a dictionary that defaults to a list: crimes_by_district
crimes_by_district = defaultdict(list)

# Loop over a DictReader of the CSV file
for row in csv.DictReader(csvfile):
    # Pop the district from each row: district
    district = row.pop('District')
    # Append the rest of the data to the list for proper district in crimes_by_district
    crimes_by_district[district].append(row)

In [None]:
# Loop over the crimes_by_district using expansion as district and crimes
for district, crimes in crimes_by_district.items():
    # Print the district
    print(district)
    
    # Create an empty Counter object: year_count
    year_count = Counter()
    
    # Loop over the crimes:
    for crime in crimes:
        # If there was an arrest
        if crime['Arrest'] == 'true':
            # Convert the Date to a datetime and get the year
            year = datetime.strptime(crime['Date'], '%m/%d/%Y %I:%M:%S %p').year
            # Increment the Counter for the year
            year_count[year] += 1
            
    # Print the counter
    print(year_count)

In [None]:
# Create a unique list of crimes for the first block: n_state_st_crimes
n_state_st_crimes = set(crimes_by_block['001XX N STATE ST'])

# Print the list
print(n_state_st_crimes)

# Create a unique list of crimes for the second block: w_terminal_st_crimes
w_terminal_st_crimes = set(crimes_by_block['0000X W TERMINAL ST'])

# Print the list
print(w_terminal_st_crimes)

# Find the differences between the two blocks: crime_differences
crime_differences = set.difference(n_state_st_crimes, w_terminal_st_crimes)

# Print the differences
print(crime_differences)