This project downloads a CSV file from a given URL, processes the data to separate successful and unsuccessful flights, and then uploads the processed data to a PostgreSQL database.
- Downloads a CSV file from a remote source.
- Processes the dataset to filter flights based on their cancellation status.
- Uploads the processed data to a PostgreSQL database.
Ensure you have the following dependencies installed:
pandasrequestssqlalchemypsycopg2json
Install dependencies using pip:
pip install pandas requests sqlalchemy psycopg2Create a config.json file in the project directory with the following structure:
{
"warehouse": {
"user": "your_db_user",
"password": "your_db_password",
"host": "your_db_host",
"port": "your_db_port",
"database": "your_database_name"
}
}csv-to-db-project/
│
├── csv_module.py # Core functionality
├── config.json # Database configuration
├── main.py # Example script
└── README.md # Documentation
The CSV file contains flight records with the following columns:
- flight_id (int): Unique identifier
- departure (str): Departure airport code
- arrival (str): Arrival airport code
- was_cancelled (bool): Flight cancellation status
- delay_minutes (int): Delay in minutes
Example row:
flight_id,departure,arrival,was_cancelled,delay_minutes
1001,JFK,LAX,False,15
- Import the required functions:
from csv_module import csv_downloader, send_data_to_db- Set the CSV file URL:
CSV_URL = "https://drive.usercontent.google.com/u/0/uc?id=1TQRSONybodmFG7yBrMuCxb2JRFVaowJR&export=download"- Download the CSV file and load it into a DataFrame:
flights = csv_downloader(csv_url=CSV_URL)- Process the data:
successful_flights = flights[flights['was_cancelled'] == True]
unsuccessful_flights = flights[flights['was_cancelled'] == False]
print(successful_flights.shape)
print(unsuccessful_flights.shape)- Send data to the database:
table1 = "completed_flights"
table2 = "failed_flights"
send_data_to_db(df=successful_flights, table_name=table1)
send_data_to_db(df=unsuccessful_flights, table_name=table2)- Downloads the CSV file from the provided URL and loads it into a pandas DataFrame.
- Handles request errors gracefully.
- Uploads the given DataFrame to the specified PostgreSQL table.
- Uses database credentials from the
config.jsonfile.
- Ensure your database is running and accessible.
- Modify the database credentials in
config.jsonaccordingly.
This project is open-source and available for modification and use under the MIT license.