<div class="alert alert-block alert-info">
<h3> <b> Part 1. Project Setup <b></h3>
<div>

❓ **Your challenge**: 

- In your project folder, create two folders `Raw Data` to store extracted data from Github and `Result` to store any outcome files
- Then, create a module `download_data.py` that can automatically link to our data source: "https://github.com/CapitalOneRecruiting/DA-Airline-Data-Challenge/raw/main/data.zip", download, unzip it and save to the `Raw Data` folder you just created. Make sure the zip file will be deleted automaticlly once the data extraction is done

💡 Suggested methodology:
- Use the notebook below to write and test your code step-by-step first
- Then copy the code into `download_data.py` once you are certain of your code logic
- Lastly, import the `download_data.py` module to confirm its feasibility

🔥 Notebook best practices (must-read) 👇

<details>
    <summary>▸ <i>click here</i></summary>

From now on, exploratory notebooks are going to become pretty long, and we strongly advise you to follow these notebook principles:
- Code your logic so that your Notebook can always be ran from top to bottom without crashing (Cell --> Run All)
- Name your variables carefully 
- Use dummy names such as `tmp` or `_` for intermediary steps when you know you won't need them for long
- Clear your code and merge cells when relevant to minimize Notebook size (`Shift-M`)
- Hide your cell output if you don't need to see it anymore (double-click on the red `Out[]:` section to the left of your cell).
- Make heavy use of jupyter nbextention `Collapsible Headings` and `Table of Content` (call a TA if you can't find them)
- Use the following shortcuts 
    - `a` to insert a cell above
    - `b` to insert a cell below
    - `dd` to delete a cell
    - `esc` and `arrows` to move between cells
    - `Shift-Enter` to execute cell and move focus to the next one
    - use `Shift + Tab` when you are between method brackets e.g. `groupby()` to get the docs! Repeat a few times to open it permanently

</details>





In [7]:
# Add any packages you need here
import pyforest
import copy
import string
import missingno as msno
import requests
import zipfile
import os

import warnings
warnings.filterwarnings("ignore")

# pd.set_option("display.max_rows", None)
pd.set_option("display.max_columns", None)
pd.set_option("display.max_colwidth", None)
pd.set_option("display.float_format", lambda x: "%.2f" %x) # suppress scientific notation

# This can help to autoreload the packages you create
%load_ext autoreload
%autoreload 2

<IPython.core.display.Javascript object>

<IPython.core.display.Javascript object>

<IPython.core.display.Javascript object>

The autoreload extension is already loaded. To reload it, use:
  %reload_ext autoreload


<div class="alert alert-block alert-success">
<h4> <b> 1. Project Folder Setup <b></h4>
<div>

#### a) `Raw Data`

🎁 We give you the pseudo-code below 👇 for this first operation:

> 1. Check if in the project folder, there is already a `Raw Data` folder
2. If True, set your default directory to the `Raw Data` folder
3. If not, create a `Raw Data` folder in the project folder, and print "Raw data folder created"

<details>
    <summary>💡Hint for functions you may need</summary>

- if-else, 
- os.path.join()    
- os.getcwd()    
- os.mkdir()

</details>

In [8]:
# Create/change data import file path - write your code here
if os.path.exists("Raw Data"):
    import_file_path = os.path.join(os.getcwd(), "Raw Data")
    print("Import file path is:", import_file_path)
else: 
    os.mkdir("Raw Data")
    print("Raw Data folder created")

Raw Data folder created


#### b) `Result Data`

🎁 We give you the pseudo-code below 👇 for this operation:

> 1. Check if in the project folder, there is already a `Result` folder
2. If True, set your default directory to the `Result` folder
3. If not, create a `Result` folder in the project folder, and print "Raw data folder created"

In [10]:
# Create/change result export file path - write your code here
if os.path.exists("Result"):
    result_file_path = os.path.join(os.getcwd(), "Result")
    print("Result file path is:", result_file_path)
else: 
    os.mkdir("Result")
    print("Result folder created")

Result folder created


<div class="alert alert-block alert-success">
<h4> <b>2. Code download_data.py <b></h4>
<div>

#### a) `fetch_zip_file` function

👉 Our goal is to create a function called `fetch_zip_file` that can download data with url automatically, here is what you can do:

> 1. Use requests.get to download data from url("https://github.com/CapitalOneRecruiting/DA-Airline-Data-Challenge/raw/main/data.zip")
2. Save the extracted zip file with the name `data.zip` in the `Raw Data` folder 
3. Make sure the status code is 200, otherwise print ("ZIP file request failed!)

<details>
    <summary>💡Hint for functions you may need</summary>

- requests.get() 
- open
- write

</details>

In [9]:
# Create fetch_zip_file function - write your code here
def fetch_zip_file():
    """
    Download, unzip and extract files with url
    """
    # Try to acquire the zip file
    try:
        response = requests.get(url)
    except OSError:
        print("Connection Failed!")
        return None

    # Check if the request works
    if response.status_code == 200:
        # Save dataset to file
        print("File request successfully")
        open("Raw Data/data.zip", "wb").write(response.content)
    else:
        print("ZIP file request failed!")
        return None

#### b) `main` function

👉 Our goal is to create a function called `main` to execute `fetch_zip_file()`, unzip the data and delete the .zip file once the extraction is ready, here is what you can do:

> 1. Run `fetch_zip_file()` that requests `data.zip` from url and save the zip file in the `Raw Data` folder
2. Unzip the file (you can use zipfile.Zipfile and extractall function to achieve this)
3. Delete the `data.zip` file once you make sure the data is succesfully unzipped

<details>
    <summary>💡Hint for functions you may need</summary>

- zipfile.Zipfile()
- os.remove()

</details>

In [10]:
# Create main function - write your code here
def main():
    # Get the ZIP file
    fetch_zip_file()

    # Unzip
    with zipfile.ZipFile("Raw Data/data.zip", "r") as zip_ref:
        zip_ref.extractall("Raw Data")

    # Delete zip file
    if len(os.listdir("Raw Data")) > 0:
        os.remove("Raw Data/data.zip")
    else:
        pass

#### c) Test your functions 

In [None]:
# Test your main function here with the following code
url = "https://github.com/CapitalOneRecruiting/DA-Airline-Data-Challenge/raw/main/data.zip"

main()

<div class="alert alert-block alert-success">
<h4> <b>3. Build & Test download_data.py <b></h4>
<div>

👉 Convert your jupyter notebook code into a .py file for later usage, I'll teach you how to implement and test it

In [13]:
%%time
# Test the download_data.py module - write your code here
from Airport import download_data

download_data.main()

File request successfully
Wall time: 44.8 s


🏁 Congratulations! 

💾 Save your notebook before starting the next challenge.