<span style="font-size:3em;"> Analysis of the global plant-based meat market </span> 

This is an analysis of the global plant-based meat market, that take into consideration the following three axes for each country: water shortages, the meat consumption in relation to the production and the percentage of vegetarians, creating a coefficient to predict which country will be more suited to the market. 

The following notebook is the cleaning and prep of the vegetarian population part. 
This CSV file, along to the other parts of the analysis, has been used to build a dashbord in Tableau, that you can find here:
https://public.tableau.com/app/profile/rossana.coro/viz/VegetalMeatProject/Accueil

In [12]:
import pandas as pd
import numpy as np

In [13]:
#Import csv file from folder
dataset_brut = pd.read_csv("/Users/rossana/Desktop/demoday/fichiers bruts/vegetarianism-by-country-2023.csv")
dataset_brut.head()

Unnamed: 0,place,pop2023,growthRate,area,country,cca3,cca2,ccn3,region,subregion,unMember,officialName,landAreaKm,density,densityMi,Rank,vegetarianismByCountry_percVeg,vegetarianismByCountry_totVeg
0,356,1428627663,0.00808,3287590,India,IND,IN,356,Asia,"Southern Asia, South Central Asia",True,Republic of India,2973190.0,480.5033,1244.5036,1,24.0,419000000
1,156,1425671352,-0.00015,9706961,China,CHN,CN,156,Asia,Eastern Asia,True,People's Republic of China,9424702.9,151.2696,391.7884,2,5.0,50000000
2,840,339996563,0.00505,9372610,United States,USA,US,840,North America,Northern America,True,United States of America,9147420.0,37.1686,96.2666,3,5.0,16000000
3,76,216422446,0.00515,8515767,Brazil,BRA,BR,76,South America,"South America, Latin America",True,Federative Republic of Brazil,8358140.0,25.8936,67.0645,7,14.0,29260000
4,643,144444359,-0.00186,17098242,Russia,RUS,RU,643,Europe,Eastern Europe,True,Russian Federation,16376870.0,8.82,22.8439,9,1.0,1400000


In [14]:
#Delete unuseful column
dataset = dataset_brut.drop(columns=["place","growthRate","area","cca2","ccn3","unMember","officialName","landAreaKm","density","densityMi"])
dataset.head()

Unnamed: 0,pop2023,country,cca3,region,subregion,Rank,vegetarianismByCountry_percVeg,vegetarianismByCountry_totVeg
0,1428627663,India,IND,Asia,"Southern Asia, South Central Asia",1,24.0,419000000
1,1425671352,China,CHN,Asia,Eastern Asia,2,5.0,50000000
2,339996563,United States,USA,North America,Northern America,3,5.0,16000000
3,216422446,Brazil,BRA,South America,"South America, Latin America",7,14.0,29260000
4,144444359,Russia,RUS,Europe,Eastern Europe,9,1.0,1400000


In [15]:
#Rename columns
dataset = dataset.rename(columns={"pop2023": "Population_2023", "country": "Country",  "cca3": "Location_code", "region": "Continent","subregion": "Sub_continent","vegetarianismByCountry_percVeg": "Vegetarianism_percentage","vegetarianismByCountry_totVeg":"Vegetarian_population"})
dataset.head()

Unnamed: 0,Population_2023,Country,Location_code,Continent,Sub_continent,Rank,Vegetarianism_percentage,Vegetarian_population
0,1428627663,India,IND,Asia,"Southern Asia, South Central Asia",1,24.0,419000000
1,1425671352,China,CHN,Asia,Eastern Asia,2,5.0,50000000
2,339996563,United States,USA,North America,Northern America,3,5.0,16000000
3,216422446,Brazil,BRA,South America,"South America, Latin America",7,14.0,29260000
4,144444359,Russia,RUS,Europe,Eastern Europe,9,1.0,1400000


In [16]:
#Checking the countries code in order to join them to the main dataset
country_water= dataset['Location_code'].unique()
print(sorted(country_water))

['ARG', 'BEL', 'BRA', 'CAN', 'CHE', 'CHL', 'CHN', 'CZE', 'DEU', 'DNK', 'ESP', 'EST', 'FIN', 'FRA', 'GRC', 'IND', 'IRL', 'ISR', 'ITA', 'JAM', 'JPN', 'KOR', 'LTU', 'LVA', 'MEX', 'NLD', 'NOR', 'NZL', 'PHL', 'POL', 'PRT', 'RUS', 'SGP', 'SVN', 'SWE', 'THA', 'TWN', 'UKR', 'USA', 'VNM']


In [17]:
#Keeping just the country that also are in the main dataset in order to join them with the main dataset
country_list = ['AFR', 'ARG', 'ASP', 'AUS', 'BRA', 'BRICS', 'CAN', 'CHE', 'CHL', 'CHN', 'COL', 'DVD', 'DVG', 'EGY', 'ETH', 'EUN', 'EUR', 'GBR', 'IDN', 'IND', 'IRN', 'ISR', 'JPN', 'KAZ', 'KOR', 'LAC', 'MEX', 'MYS', 'NGA', 'NOA', 'NOR', 'NZL', 'OCD', 'OECD', 'PAK', 'PER', 'PHL', 'PRY', 'RUS', 'SAU', 'THA', 'TUR', 'UKR', 'USA', 'VNM', 'WLD', 'ZAF']

dataset = dataset[dataset['Location_code'].isin(country_list)]
dataset.shape

(19, 8)

In [18]:
#Check if the these countries correspond to the main file countries
location = np.unique(dataset['Location_code'])

sorted_location = np.sort(location)

print(sorted_location)

['ARG' 'BRA' 'CAN' 'CHE' 'CHL' 'CHN' 'IND' 'ISR' 'JPN' 'KOR' 'MEX' 'NOR'
 'NZL' 'PHL' 'RUS' 'THA' 'UKR' 'USA' 'VNM']


In [19]:
#Does the values are consistent?
dataset.describe(include="all")

Unnamed: 0,Population_2023,Country,Location_code,Continent,Sub_continent,Rank,Vegetarianism_percentage,Vegetarian_population
count,19.0,19,19,19,19,19.0,19.0,19.0
unique,,19,19,5,11,,,
top,,India,IND,Asia,Eastern Asia,,,
freq,,1,1,8,3,,,
mean,227173500.0,,,,,38.947368,8.710526,30391110.0
std,431077600.0,,,,,41.343377,5.770316,94994120.0
min,5228100.0,,,,,1.0,1.0,425000.0
25%,28187110.0,,,,,9.5,5.0,1223000.0
50%,71801280.0,,,,,20.0,7.0,2300000.0
75%,136450000.0,,,,,53.0,11.0,13580000.0


In [20]:
print("Yes, they are")

Yes, they are


In [21]:
#Export new file csv in order to join it to the main table in the dashboard
dataset.to_csv(r"/Users/rossana/Desktop/demoday/Vegetarianism_per_country.csv")