# Investment Opportunities Analysis: Evaluating CDs, T-Bills, and ETFs for Optimal Portfolio Performance

## Overview
This project provides a detailed examination of financial investments, focusing on Certificates of Deposit (CDs), Treasury Bills (T-Bills), and Exchange-Traded Funds (ETFs). Using data from Bankrate, the U.S. Department of Treasury, and Yahoo Finance, the study analyzes the performance and risks of each investment type to identify which options best balance safety and returns.

The goal is to provide novice investors with clear insights to help them make informed decisions. By understanding these dynamics, the project aims to guide new investors toward choices that fit their financial goals and risk tolerance.

## Technologies Used
- Python
- Jupyter Notebook
- Other Libraries: Pandas, NumPy, Seaborn, Matplotlib, Requests, DrissionPage

##  APIs and Data Sources: 
- Bankrate: Source for the latest rates on Certificates of Deposit through web scraping techniques.
- U.S. Department of Treasury: Provides data on Treasury Bills through their public API.
- Yahoo Finance: Source for Exchange-Traded Funds (ETFs) data, accessed through web scraping techniques.

## Dataset Download Instructions (Data_Collection.ipynb)
Each dataset can be downloaded using provided functions within the notebook. These functions allow for downloading either a complete dataset or a sample for preliminary analysis.

Overview: Each script is encapsulated within its own function, enabling users to download and save data directly to CSV files. Options are available to download either a complete dataset or a smaller, sample-sized dataset (5 lines) for preliminary analysis.

Names of the datasets from the web:

1. **BankRate CD Rates**
2. **Treasury Bills Rates**
3. **Yahoo Finance ETFs**

Each script is encapsulated within its own function, enabling users to download and save data directly to CSV files. Options are available to download either a complete dataset or a smaller, sample-sized dataset (5 lines) for preliminary analysis.

### Prerequisites

Before running these scripts in a Jupyter Notebook, ensure the following are installed:
- Python 3.7 or newer
- Requests library
- DrissionPage library

You can install the necessary Python libraries using pip. Run the following commands in a Jupyter cell:

```python
!pip install requests
!pip install DrissionPage


### How to Run

Each dataset download can be initiated by calling its respective function with appropriate parameters:

### Downloading BankRate CD Rates
To download data on CD rates:

download_dataset1(out_file1: str, sample_p: bool = True)

* out_file1: Base name for the output file (CSV) where the data will be saved.
* sample_p: Boolean; set to True for a sample of the data (5 entries), or False for the full dataset.

download_dataset1("bankrate", sample_p=True)

* Run this code, we will get a CSV file store as 'bankrate_samples.csv' with only 5 data entries since sample_p is set to be True


### Downloading Treasury Bills Rates
To download data on Treasury Bills rates:

download_dataset2(out_file2: str, sample_p: bool = True)

* out_file2: Base name for the output file (CSV) where the data will be saved.
* sample_p: Boolean; set to True for a sample of the data (5 entries), or False for the full dataset.

download_dataset2("Treasury Bills Rates", sample_p=True)

* Run this code, we will get a CSV file store as 'Treasury Bills Rates_samples.csv' with only 5 data entries since sample_p is set to be True

**price_per100** has some null value, in the future analysis in final project, either data imputation tech will be applied to filled the null or those null value will be simply dropped. For now, I will just leave it as it is. 

### Downloading Yahoo Finance ETFs
To download ETF data from Yahoo Finance:

download_dataset3(out_file3: str, sample_p=True):

* out_file3: Base name for the output file (CSV) where the data will be saved.
* sample_p: Boolean; set to True for a sample of the data (5 entries), or False for the full dataset.

download_dataset3("Yahoo Finance ETFs", sample_p=True)

* Run this code, we will get a CSV file store as 'Yahoo Finance ETFs_samples.csv' with only 5 data entries since sample_p is set to be True

## Data cleaning, Analysis and Visualization (Data_Analysis.ipynb)

### Data Collection: 
- Download data directly from online sources into CSV files using the function in Data_Collection.ipynb.
### Data Processing:
- Standardization: Convert time-related terms to a consistent format, transform time format from years to months and standardizing duration terms.
- Numeric Conversion and Cleaning: Ensures all financial figures are in a numerical format, removing characters like '%' and \\'$', and converting strings to float or integer values. 

### Data Cleaning:
- Robust Data Cleaning Methods: Implements data imputation with  median by category to handle missing data and removing unnecessary or erroneous columns.
- Type Conversions for Accurate Analysis: Converts data types for critical variables, such as converting 'Volume' from text to integers in ETF data, accounting for units like 'M' for millions and 'K' for thousands, and adjusting percentage changes to float values for precise calculations.

### Exploratory Data Analysis:
- Visualization: Employs various visualization techniques to compare investment performances and risk profiles to identify the trends and outliers of the dataset.
- Descriptive Statistics: Provides a statistical summary for each dataset, giving insights into the central tendencies, variabilities, and distributions to understanding market conditions.

### Modeling and Insights：
- Comparative Analysis: Analyzes and compares different investment options based on cleaned and processed data, offering insights into which investments are less risky and potentially more profitable.
- Risk Assessment and Performance Evaluation: Evaluates the risk and return profiles of CDs, T-Bills, and ETFs to guide potential investment strategies, particularly beneficial for novice investors.