# Project Plan:
## Assumptions:
- I am going to assume that the recommender engine will not include user/customer information. You would need user demographics, user behavior, and probably device data and other data to make a recommender, but we don't have that. I assume that will be what is being simulated by the data scientists. Because I don't know what variables they will want to change, I don't know what fields they will need in the tables, so for now I am just going to assume they will be creating user demographic data.
- I am going to assume that the data scientists know how to use pandas.
- I am going to assume that analysts can use pgadmin to access the database for ad hoc queries
- I am going to assume that the data scientists will use ipython notebooks to interact with the db.

## Overall Objectives:
- Provide a database to support the simulation engine.
- Offer both direct SQL access and Python APIs for read and write operations.
- Create a demonstration recommender engine.

## Plan:

I'm planning to use fast api and sqlmodel to do the api db interface.


### 1. Evaluate and Select RDBMS:
#### Objective: Choose an RDBMS that fits the needs of the project.
#### Tasks:
##### PostgreSQL is an optimal choice for the recommender system for these reasons:
- Scalability: It offers high scalability, accommodating both vertical and horizontal growth, making it suitable for an expanding system.
- Performance: PostgreSQL is known for handling complex queries efficiently. This is particularly beneficial for a recommender system that requires complex data interactions.
- Community Support: With a strong open-source community, PostgreSQL provides extensive documentation, forums, and support, aiding in both development and troubleshooting.
- Ease of Integration with Python ORMs: PostgreSQL can be easily interfaced with Python Object-Relational Mapping (ORM) tools like FastAPI and SQLModel, allowing for a more streamlined and efficient development process.
- In summary, the combination of scalability, advanced query performance, vibrant community support, and easy integration with popular Python ORMs makes PostgreSQL a fitting choice for building a recommender system that can evolve and adapt to complex data needs.


### Notes:
- If we want to normalize further (see ERD 2), then we will need to extract the main production country from the "best" tables, by comparing eg. best_movies['main_production_country'] to prod_countries['country']. 
1. ✅ Change name of id columns in credits and titles dfs to content_id
2. Put content_id into best dfs

### 1.1 Create raw schema for raw original data

### 1.2 Load raw data into raw schema in db to meet analyst request to see raw data via ad hoc queries.
1. ✅ Load into db (see load_raw_data.ipynb)
2. ✅ Create read only user for analysts.

### 2. Design and Normalize the Database:
#### Objective: Transform the initial data into a suitable data model.
#### Tasks:
- Analyze initial data.
- Design tables and relationships.
- Normalize to at least 2NF to balance analytical needs and performance
- Document assumptions and rationale for design choices.

### 2.1 Create relational schema for normalized data, create tables schemas, load data into tables

### 3. Set Up Database and Load Initial Data:
#### Objective: Create the database and import the initial data.
#### Tasks:
- Set up the selected RDBMS.
- Create tables as designed.
- Load initial data.
- Test with sample queries to ensure everything is working.

### 4. Develop APIs for Reading and Writing Data:
#### Objective: Enable interaction with the database through Python APIs.
#### Tasks:
- Design Python API endpoints for required operations.
- Implement read and write operations.
- Provide access to ad-hoc SQL queries.
- Ensure security and validation of inputs.
- Document API usage.

- 4.1. Design API Endpoints
Identify the operations needed, such as retrieving specific data, updating records, or inserting new records.
Define clear and concise endpoints for each operation. For example, you might have endpoints like /get_movie_details and /add_user_rating.

- 4.2. Implement Read and Write Operations
Use an ORM like SQLAlchemy to manage database interactions within Python. This will allow you to define models that correspond to your database tables and easily perform CRUD operations.
Write functions that correspond to each endpoint. These functions should translate incoming requests into appropriate database queries.

- 4.3 Provide Access to Ad-Hoc SQL Queries
Consider creating an endpoint that accepts raw SQL queries from authorized data analysts. This requires careful handling to prevent SQL injection attacks.

- 4.4 Ensure Security and Validation
Validate all incoming data to ensure that it adheres to expected formats.
Implement authentication and authorization as needed, ensuring that only authorized users can perform certain actions.

- 4.5 Document API Usage
Create clear and concise documentation that explains how to interact with each endpoint, including the expected request format and response.

### 5. Build a Demo Recommender Engine:
#### Objective: Showcase the functionality of the APIs and database.
#### Tasks:
- Develop a simple recommender engine.
- Implement reading and writing data using the developed APIs.
- Test with real or simulated user data.

- 5.1 Develop a Simple Recommender Engine
Choose a recommendation algorithm that fits the scope of your demonstration. Collaborative filtering is a common approach that can be implemented relatively quickly.
You can use libraries like Scikit-Surprise, which offers various recommendation algorithms.

- 5.2 Implement Reading and Writing Data Using APIs
Within your recommender engine, make HTTP requests to the APIs you developed in Part 4 to read and write data.
For reading, you might retrieve user preferences, historical ratings, or other relevant information.
For writing, you might store predictions or user feedback.

- 5.3 Test with Real or Simulated User Data
Create tests that mimic real user interactions, or use a dataset that resembles what real users might provide.
Ensure that the recommender engine can make reasonable predictions and that the read and write operations function correctly.

### 6. Provide Improvement Suggestions:
#### Objective: Analyze the solution and propose improvements.
#### Tasks:
Review the entire solution.
Identify areas for potential improvements.
Document suggestions.

### Bonus Challenges (Optional):
Write unit and integration tests for the solution.
Build an actual recommender engine to recommend movies.