## This homework assesses your ability of building and deploying your applications. 


`start date: Nov 22nd 11:59 PM` <br>
`due date: Dec 5th 11:59 PM`

### Make sure you submitted your submissions to BrightSpace.

`Total credits:  63/53`

You're welcome to share your thoughts about the homework and the course materials here: https://forms.gle/Kd9AoUZwkMiF8Vx5A

# P1: File path related questions

In the following structure, `project`, `data`, and `scripts` are folder names. `my_notebook.ipynb` is the Jupyter Notebook you used to create these folders.

```python
project/
    data/
        file_name.csv
    scripts/
        script.py
my_notebook.ipynb  

```

`P1.1   3 pts` Suppose I'm currently at the same level as the project folder, meaning I am outside the project folder but within the same parent directory. I want to open the file_name.csv file using the following code written in my my_notebook.ipynb:

```python
df = pd.read_csv('file_name.csv')

```
Will this code work? If not, how should I revise it? 

The code will not work. It won't work because file_name.csv is located inside the data folder inside the project folder. Since you are currently at the same level as the project folder, the path 'file_name.csv' will not read the csv file in. In order to fix this, you should change the file path that you are reading. 'project/data/file_name.csv' is the correct path that should be used to read the csv file in.

`p1.2 3 pts`  Suppose you are now in the `data` folder, and you want to re-write your `script.py` file using the `%%writefile` method while reflecting all your actions (e.g., cd commands, file path changes) within the `my_notebook.ipynb` file. Can you demonstrate how you would accomplish this?

Step 1: Use %cd followed by the file path to the project folder to navigate to the project folder because the scripts folder is located within the project folder.
```
%cd .../project
```
Step 2: Use %%writefile with the correct relative path to write script.py in the scripts folder. '.../project/scripts/script.py' is what the file path should look like.
```
%%writefile .../project/scripts/script.py
print('Hello World!')
```
Step 3: After writing the script.py file, you can verify the file was written correctly by opening and reading the content of the file. You can do this using 
```
with open('.../project/scripts/script.py', 'r') as f:
        print(f.read())
```
This will allow you to read the contents of the script.py file and verify its content.

`p1.3 3 pts`  Suppose you are outside of the `project` folder but still within the same parent directory, and you want to re-write your `script.py` file using the `%%writefile` method while reflecting all your actions (e.g., cd commands, file path changes) within the `my_notebook.ipynb` file. Can you demonstrate how you would accomplish this? Assume you did everything within your `my_notebook.ipynb`. 

Step 1: Using %cd, navigate to the correct folder which is the project folder in this case.
```
%cd .../project
```
Step 2: Use %%writefile to overwrite the script.py file using the relative path of script.py to overwrite the file in the correct location.
```
%%writefile .../project/scripts/script.py
print('Hello World!')
```
Step 3: After writing the script.py file, you can verify the file was written correctly by opening and reading the content of the file. You can do this using 
```
with open('.../project/scripts/script.py', 'r') as f:
        print(f.read())
```

`p1.4 3 pts` Suppose you are outside of the project folder but still within the same parent directory, and you want to create a sub-folder named `test_data` under the `data` folder. Can you demonstrate how you would accomplish this? Assume you did everything within your `my_notebook.ipynb`. 

Step 1: You can import os to help you create the file path.
Step 2: You can define the path that you want for the test_data sub-folder and use os.path.join to set that path.
```
path = os.path.join('project', 'data', 'test_data')
```
Step 3: The folder still needs to be created. To do this, you need to use os.makedirs() to create the folder.
```
os.makedirs(path)
```
Step 4: Finally you can verify the creation of the file by listing the contents of the data folder.
```
os.listdir(os.path.join('project', 'data'))
```

`p1.5 5 pts` Suppose you are outside of the project folder but still within the same parent directory, and you want to import your script.py file as a module. Your `script.py` file as the following contents:

```python
df = pd.read_csv('file_name.csv')
```

You have used the following codes to do the import within your `my_notebook.ipynb` file:

```python
import script
```

What are the issues with the script.py file ifself and also the way to import it? Explain bellow and fix it yourself. 

One issue is that the relative path in pd.read_csv() will raise a FileNotFoundError since the script is being imported from outside the project folder. To fix this you can update the file path that pd.read_csv() is reading to be the absolute file path of the csv. You can do this by using os.path.dirname(__file__) to find the absolute path.
```
path = os.path.join(os.path.dirname(__file__), '.../data/file_name.csv').

Another issue is that when you write import script, you assume that script.py is in the current directory. However, it isn't and you would get a ModuleNotFoundError. To fix this, you can adjust the search path to help locate it. Adding the scripts folder to the python path will help the jupyter notebook locate the script.py file inside the scripts folder.
```
import sys
from pathlib import Path
path = Path('project/scripts').resolve()
```

`p1.6 6 pts` Suppose you are outside of the project folder but still within the same parent directory, and you want to first import the class `clean_data`  from `script.py`, and then create an instance of the class. Your `script.py` file as the following contents:

```python
class clean_data:

    def __init__(self):
        df = pd.read_csv(`file_name.csv`)
```

You have used the following codes to import and create instance:

```python
import script

clean_data = clean_data()

```

What are the issues with the script.py file ifself(1 issue ) and also the way to import it (2 issues)  and create instance(1 issue) ? Explain bellow and fix it yourself. 

In the clean data class, the file path that is being read is wrong. pd.read_csv() is using a relative path that is assuming that the file is in the current working directory. This is not the case, so you need to construct the absolute path to the file_name.csv file and read that path with pd.read_csv().

There are also issues with importing the clean_data class. First, the import syntax is incorrect. The python path needs to be modified in ordet to properly import it. 
```
import sys
import pathlib from Path
path = Path('project/scripts').resolve()
sys.path.append(str(path))
```
The other issue with importing is that clean_data is not imported directly. The code above will return a NameError since the name clean_data is not in the local namespace. To fix this, you can import clean_data directly.
```
from script import clean_data
```

There is also an issue with creating an instance. When writing clean_data = clean_data(), you are overwriting the clean_data class. This means you will no longer have access to the class that was imported. The way to fix this is to not re-use the class name when writing an instance. 

# P2. `30 pts` Coding challenges (Individual version)

For this question, you will build a movie recommendation system using K-Nearest Neighbors (KNN) and create a webpage interface with Streamlit. The Streamlit webpage should allow the user to input a movie name they have watched before, and based on that input, the system will recommend 4 similar movies. Additionally, you will deploy your Streamlit app on AWS EC2.

For this question, I will place fewer restrictions on the choice of data and features, allowing you to mimic real-world decision-making as data professionals. In previous assignments, I guided you step-by-step to teach you how to correctly code each small part. Now that youâ€™ve gained those foundational skills, this assignment will focus more on exercising your ability to design and solve problems independently.

You are free to use any dataset you like for this assignment. The movie dataset from HW8 is sufficient, but you are welcome to explore and use other datasets if you prefer.

Your final submission should include:

- A screenshot of the Streamlit webpage displaying your app and its recommendations.
- The source code used to create the application.

`Bonus 10 pts` use a pre-trained LLM model to add a short description to the movie provided by the user. 


In [None]:
import pandas as pd
import streamlit as st



# P2. `30 pts` Coding challenges (Group version)

For this assignment, you are allowed to collaborate with up to 2 classmates and turn it into a group project. You can choose any problem and dataset to work on, but your project must demonstrate 3 of the following skills. Note: You must include either 3 (Running the project in Docker) or 4 (Deploying the application on AWS EC2) as one of your chosen skills.

1. Creating a Python package.
2. Building a webpage using Streamlit with user input functionality.
3. Running the project in Docker (mandatory if 4 is not chosen).
4. Deploying the application on AWS EC2 (mandatory if 3 is not chosen).
5. Applying KNN to solve a problem.
6. Utilizing LLM models in your application.


In your submission, you must clearly list your teammates' names, and each team member must submit the assignment individually on Brightspace.

Your final submission should include:

- A screenshot of the Streamlit webpage displaying your app and its functionality.
- The source code for the application.

`Bonus 10 pts` 


Prepare a roughly 5 minute presentation to deliver in class. If you plan to present, please notify me at least 2 days before the deadline to allow sufficient time for adjustments to the course schedule. Bonus points will be awarded after the presentation.