This project focuses on time series forecasting of air quality data using the Facebook Prophet algorithm. The dataset contains hourly averaged responses from an array of metal oxide chemical sensors deployed in an Italian city, along with ground truth concentrations for various pollutants provided by a co-located reference certified analyzer.
The dataset used in this project is the "Air Quality Dataset" from the UCI Machine Learning Repository. It was collected between March 2004 and February 2005 in an Italian city, providing a comprehensive set of readings over a full year. The dataset is provided in the AirQualityUCI.csv
file.
The Air_Quality_Forecasting.ipynb
Jupyter Notebook contains the code for preprocessing the data and performing time series forecasting using the Facebook Prophet algorithm.
The notebook includes a comprehensive preprocessing section that handles the following tasks:
- Data Loading: The script loads the dataset from the provided AirQualityUCI.csv file, handling the semicolon-separated values and comma-based decimal representation.
- Missing Value Handling: The script identifies and replaces missing values (tagged with -200) with NaN, and then fills the NaN values with the mean of the respective columns.
- Date and Time Conversion: The script converts the 'Date' column from 'DD/MM/YYYY' format to 'YYYY-MM-DD' format and the 'Time' column from 'HH.MM.SS' format to 'HH:MM:SS' format, suitable for time series analysis.
- Feature Engineering: A new 'ds' (date-time stamp) column is created by combining the 'Date' and 'Time' columns, and this column is converted to datetime format.
- Target Variable Selection: The script selects the 'Relative Humidity (RH)' column as the target variable for forecasting.
The notebook includes a section for time series forecasting using the Facebook Prophet algorithm:
-
Data Preparation: The script creates a new DataFrame with the 'ds' column as the datetime index and the 'RH' column as the target variable ('y').
-
Prophet Model Initialization: The script initializes a new instance of the Facebook Prophet model.
-
Model Training: The script trains the Prophet model on the prepared data.
-
Future Dataframe Generation: The script creates a future dataframe with hourly intervals for the next 365 hours (approximately 15 days) using the
model.make_future_dataframe()
function. -
Forecasting: The script generates the forecast using the trained model and the future dataframe with
model.predict()
. -
Visualization: The script includes two visualization functions:
-
model.plot(forecast): Plots the forecasted values and the actual values from the training data.
-
model.plot_components(forecast): Plots the different components (trend, weekly, and yearly seasonality) of the time series data.
To run the project, follow these steps:
- Clone this repository:
git clone https://github.com/amangupta143/Air-Quality-Prediction-FB-Prophet.git
- Install the required dependencies (pandas, numpy, and fbprophet).
- Open the
Air_Quality_Forecasting.ipynb
notebook in your preferredJupyter Notebook
environment. - Run the cells in the notebook to preprocess the data, train the Prophet model, and generate the forecast.
The notebook will display the visualizations and the forecasted values.
Contributions to this project are welcome. If you find any issues or have suggestions for improvements, please open an issue or submit a pull request.
To contribute:
- Fork the repository
- Create a new branch for your feature or bug fix
- Make your changes and commit them with descriptive commit messages
- Push your changes to your forked repository
- Submit a pull request to the main repository
This project is licensed under the MIT License.
The Air Quality dataset is provided by the UCI Machine Learning Repository: Air Quality Data Set.
The Facebook Prophet library is developed and maintained by Facebook's Core Data Science team: Facebook Prophet.
Happy coding! 🚀