This project is a comprehensive Statistical Analysis developed as a Final Assignment. The study focuses on food consumption patterns across different countries and their associated carbon emissions. Using Python, I explored data distributions, calculated probabilities, and performed correlation analyses to identify environmental impacts.
As a student focusing on Data Science fundamentals, I implemented the following steps in the Statistics_FA_Lorenzo_Biscardi.ipynb notebook:
- Summary Statistics: Calculating mean, median, and spread to understand food consumption across various categories (e.g., beef, poultry, grains).
- Probability and Distributions: Utilizing distributions.csv to model data behavior and calculate the likelihood of specific consumption levels.
- Correlation Analysis: Measuring the strength of the relationship between food types and CO2 emissions using the food_consumption.csv dataset.
- Data Visualization: Creating histograms and boxplots to visualize outliers and the symmetry of consumption data.
- Statistics_FA_Lorenzo_Biscardi.ipynb: The main Jupyter Notebook containing all Python code, statistical formulas, and data visualizations.
- food_consumption.csv: Dataset detailing kilograms of food consumed per person per year and the resulting CO2 emissions.
- distributions.csv: Supporting data used for probability and distribution exercises.
- Language: Python 3.x
- Libraries:
- Pandas: For data manipulation and cleaning.
- NumPy: For mathematical and statistical operations.
- Matplotlib / Seaborn: For descriptive statistical plotting.
- Identified food categories with the highest environmental impact (CO2 emissions).
- Analyzed how consumption variance differs between animal-based and plant-based diets.