This project utilizes the Facebook metrics dataset sourced from Dr. David Akman's GitHub repository. The dataset comprises Facebook posts published during the year 2014 on the Facebook page of a renowned cosmetics brand. It contains various Facebook metrics associated with these posts.
The primary goal of this project was to leverage the Facebook metrics dataset to gain insights using the knowledge, tools, and skills acquired during our course. The specific objectives were:
- Data Understanding: Gain a comprehensive understanding of the dataset, its structure, and the variables it contains.
- Data Cleaning and Preprocessing: Clean and preprocess the dataset to make it suitable for analysis.
- Exploratory Data Analysis (EDA): Utilize the preprocessed dataset to visualize and explore trends, patterns, and relationships within the data.
- Statistical Modeling: Conduct statistical modeling using multiple linear regression to predict and understand the factors influencing various Facebook metrics.
In this phase, we focused on cleaning and preprocessing the dataset. This involved tasks such as handling missing values, removing duplicates, and standardizing data formats.
Using the preprocessed dataset, we performed exploratory data analysis. This phase aimed to uncover insights and trends within the data through visualization techniques. We plotted graphs and analyzed patterns to gain a deeper understanding of the Facebook metrics.
In the final phase, we conducted statistical modeling using multiple linear regression. This involved building models to predict Facebook metrics based on various factors. The process included:
- Full Model Overview: Understanding all variables included in the model.
- Regression Formula: Formulating the ordinary least squares (OLS) model.
- Full Model Diagnostic Checks: Assessing the assumptions and goodness-of-fit of the full model.
- Reduced Model Overview: Identifying significant variables and creating a reduced model.
- Reduced Model Diagnostic Checks: Evaluating the reduced model's performance and accuracy.
Through this project, we successfully utilized the Facebook metrics dataset to gain insights into the performance of posts on the cosmetics brand's Facebook page. By combining data cleaning, exploratory analysis, and statistical modeling, we were able to extract meaningful insights and understand the factors influencing Facebook metrics.
-
Jupyter Notebook: Jupyter Notebook was utilized as the primary environment for conducting data analysis, performing code execution, and documenting the project workflow.
-
Python: Python served as the programming language for data manipulation, analysis, and modeling tasks.
-
Pandas: Pandas was employed for data cleaning, preprocessing, and data manipulation tasks.
-
NumPy: NumPy was used for numerical operations and array manipulation.
-
Matplotlib: Matplotlib was utilized for creating plots and visualizations to explore and analyze the data.
The Facebook metrics dataset used in this project was sourced from Dr. David Akman's GitHub repository. The dataset is publicly available and contains information on Facebook posts published by a cosmetics brand during the year 2014.
- Athul Varghese Thampan
- Mohammed Bilal Naeem