# 🐍 Step 1 - Python Introduction 

## 📚 Course 1: Python for Data Science: Fundamentals 

## 6️⃣ Functions

---

👦 [Anh-Thi DINH](https://dinhanhthi.com) — 🔥 [dataquest-aio](https://github.com/dinhanhthi/dataquest-aio) on Github.

⚡ **Note**: Some errors in this notebook appear intentionally to illustrate the wrong commands.

❓ You run this notebook on Google Colab? If "Yes", please replace `0` by `1` in the below cell and run it first.

In [1]:
use_colab = 0

## 📝 Mission 315

⏬ Download the takeaway for this mission in folder `/takeaways/` [on Github](https://github.com/dinhanhthi/dataquest-aio/tree/master/takeaways). [Source](https://app.dataquest.io/m/315/functions%3A-fundamentals) of this mission.

❓ **Question**: Define a function to compute square of a number.

💡 **Hint**: We create a function so that we can reuse it multiple times without rewriting the same things.

In [1]:
# instead of doing 2 time below,
squared_10 = 10 * 10
squared_16 = 16 * 16

# we can define this
def square(a_number):
    squared_number = a_number * a_number
    return squared_number

# and apply this to the numbers we want
squared_10 = square(a_number=10)
squared_16 = square(a_number=16)

print(squared_10)
print(squared_16)

100
256


💡 Look carefully at the way we define a function. 

- All function's contents are indented!
- We can use `return` to return the value of a function. However, we don't have to do this all time, we can have a function without returned value.

In [3]:
# a function without returned value
def say_hello(name):
    print("Hello", name, "!") # Again, new way of print!
    
say_hello("Thi")

Hello Thi !


👉 Before going next, we come back to the Mobile App Store data set (Ramanathan Perumal) ([source](https://www.kaggle.com/ramamet4/app-store-apple-data-set-10k-apps)),

In [2]:
# You don't need to understand the codes in this cell today (later), I just use it to display the table.
# But you need to run it to see the dataset

import pandas as pd

if use_colab:
    dataquest_aio = 'https://raw.githubusercontent.com/dinhanhthi/dataquest-aio/master/step-1-python-introduction/'
    dataset_url = dataquest_aio + 'course-1-python-for-ds-fundamentals/data/AppleStore.csv'
else:
    dataset_url = './data/AppleStore.csv' # if you use localhost
    
df = pd.read_csv(dataset_url, encoding="utf8")
df.head(5) # only show the first 5 rows of the dataset

Unnamed: 0,id,track_name,size_bytes,currency,price,rating_count_tot,rating_count_ver,user_rating,user_rating_ver,ver,cont_rating,prime_genre,sup_devices.num,ipadSc_urls.num,lang.num,vpp_lic
0,284882215,Facebook,389879808,USD,0.0,2974676,212,3.5,3.5,95.0,4+,Social Networking,37,1,29,1
1,389801252,Instagram,113954816,USD,0.0,2161558,1289,4.5,4.0,10.23,12+,Photo & Video,37,0,29,1
2,529479190,Clash of Clans,116476928,USD,0.0,2130805,579,4.5,4.5,9.24.12,9+,Games,38,5,18,1
3,420009108,Temple Run,65921024,USD,0.0,1724546,3842,4.5,4.0,1.6.2,9+,Games,40,5,1,1
4,284035177,Pandora - Music & Radio,130242560,USD,0.0,1126879,3594,4.0,4.5,8.4.1,12+,Music,37,4,1,1


In [3]:
from csv import reader
from urllib.request import urlopen

if use_colab: # you run this file on Google Colab?
    opened_file = urlopen(dataset_url).read().decode('utf-8')
    read_file = reader(opened_file.splitlines())
else: # you run this file on localhost?
    opened_file = open(dataset_url, encoding="utf8")
    read_file = reader(opened_file)

apps_data = list(read_file)

❓ **Question**: Write a function named `extract()` that can extract any column you want from the `apps_data` data set. The function should take in the index number of a column as input.

💡 **Hint**: The returned value is a list containing all rows of that column.

In [7]:
def extract(index):
    column = [] # initial list
    for row in apps_data[1:]: # loop through all rows except the first one (heading)
        value = row[index] # take the value of each row
        column.append(value) # add this value to the list (new column)
    return column # return the column

genres = extract(11) # exctract column 11 (cont_rating)
print(genres[:10]) # print first 10 values of that column

['Social Networking', 'Photo & Video', 'Games', 'Games', 'Music', 'Social Networking', 'Reference', 'Games', 'Music', 'Games']


❓ **Question**: Write a function named `freq_table()` that generates a frequency table for any list.

💡 **Hint**: The same as the tasks in previous sections, the different in this key is that you put all the things inside a function and return a dictionary.

In [8]:
def freq_table(column):
    frequency_table = {}    
    for value in column: # loop through all values of the column
        if value in frequency_table: # check if that values already exist in the dict?
            frequency_table[value] += 1 # if yes, count 1 more
        else:
            frequency_table[value] = 1 # if no, count 1 (first time)
    return frequency_table

genres_ft = freq_table(genres) # apply to column genre extracted in the previous step
print(genres_ft)

{'Social Networking': 167, 'Photo & Video': 349, 'Games': 3862, 'Music': 138, 'Reference': 64, 'Health & Fitness': 180, 'Weather': 72, 'Utilities': 248, 'Travel': 81, 'Shopping': 122, 'News': 75, 'Navigation': 46, 'Lifestyle': 144, 'Entertainment': 535, 'Food & Drink': 63, 'Sports': 114, 'Book': 112, 'Finance': 104, 'Education': 453, 'Productivity': 178, 'Business': 57, 'Catalogs': 10, 'Medical': 23}


💡 Above, we combine 2 tasks to find a frequency table for a specific column. First, we write a function to extrat a column from dataset, then we find a frequency for values in that column. Now, we try to combine both of these tasks into 1 single function.

In [10]:
def freq_table(index):
    frequency_table = {}
    
    for row in apps_data[1:]:
        value = row[index] # get the value in that column
        if value in frequency_table:
            frequency_table[value] += 1
        else:
            frequency_table[value] = 1
            
    return frequency_table

ratings_ft = freq_table(7)
ratings_ft

{'3.5': 702,
 '4.5': 2663,
 '4.0': 1626,
 '3.0': 383,
 '5.0': 492,
 '2.5': 196,
 '2.0': 106,
 '1.5': 56,
 '1.0': 44,
 '0.0': 929}

❓ **Question**: If we wanna apply to a different data set instead of `apps_data`?

💡 **Hint**: Add more than one input value for the function.

In [11]:
def freq_table(data_set, index):
    frequency_table = {}
    
    for row in data_set[1:]: # in this case, we suppose that all the input data set has a heading
        value = row[index]
        if value in frequency_table:
            frequency_table[value] += 1
        else:
            frequency_table[value] = 1
    
    return frequency_table

ratings_ft = freq_table(data_set=apps_data, index=7)
ratings_ft

{'3.5': 702,
 '4.5': 2663,
 '4.0': 1626,
 '3.0': 383,
 '5.0': 492,
 '2.5': 196,
 '2.0': 106,
 '1.5': 56,
 '1.0': 44,
 '0.0': 929}

In [12]:
# You don't have to indicate the key for the function, just keep the order of input keys
ratings_ft = freq_table(apps_data, 7)
ratings_ft

{'3.5': 702,
 '4.5': 2663,
 '4.0': 1626,
 '3.0': 383,
 '5.0': 492,
 '2.5': 196,
 '2.0': 106,
 '1.5': 56,
 '1.0': 44,
 '0.0': 929}

❓ **Question**: What's difference between indicating the key or not? 

In [16]:
# define a subtract function just for testing
def subtract(a, b):
    return a - b

print("subtract(3, 2): ", subtract(3, 2)) # 3 - 2
print("subtract(2, 3): ", subtract(2, 3)) # 2 - 3

print("subtract(a=3, b=2): ", subtract(a=3, b=2)) # 3 - 2
print("subtract(b=2, a=3): ", subtract(b=2, a=3)) # 3 - 2

subtract(3, 2):  1
subtract(2, 3):  -1
subtract(a=3, b=2):  1
subtract(b=2, a=3):  1


❓ **Question**: Write a function named mean() that computes the mean for any column we want from a data set.

💡 **Hint**: You can use nested functions.

In [17]:
def extract(data_set, index): # extract a column from a data set
    column = []    
    for row in data_set[1:]:
        value = row[index]
        column.append(value)    
    return column

def find_sum(a_list): # sum of all element in that column
    a_sum = 0
    for element in a_list:
        a_sum += float(element)
    return a_sum

def find_length(a_list): # how many elements in that column?
    length = 0
    for element in a_list:
        length += 1
    return length

def mean(data_set, index): # answer the main question
    column = extract(data_set, index)
    return find_sum(column) / find_length(column)
    
avg_price = mean(apps_data, 4)
avg_price

1.7262178685562666

💡 If you meet an error, try to understand it and find a solution!

In [19]:
mean(apps_data, 1)

ValueError: could not convert string to float: 'Facebook'

💡 `could not convert string to float: 'Facebook'` => can you guess the meaning of this error?

1. First thing, check the last line to see an description of an error, in this key, it's a `ValueError`.
2. Follow `--->` to find where the error occurs?
3. Try to understand and solve!

---

## 📝 Mission 316

⏬ Download the takeaway for this mission in folder `/takeaways/` [on Github](https://github.com/dinhanhthi/dataquest-aio/tree/master/takeaways). [Source](https://app.dataquest.io/m/316/functions%3A-intermediate) of this mission.