# Data exploration: PROTEIN CONSUMPTION (PART 1)

- This notebook will explore the data regarding the protein consumption per country.


- The protein dataset has data of the average daily protein consumption (in grams) per country for the period 1961 -2013.


- The Part 1 will analyze the consumption differences between 1961 (the 1st year of data) and 2013 (the last year of data)


- It present the following queries:  
  - 10 highest countries in protein consumption 1961  
  - 10 lowest countries in protein consumption 1961  
  - 10 highest countries in protein consumption 2013  
  - 10 lowest countries in protein consumption 2013  
  - Continents corresponding the 10 lowest countries in protein consumption 1961  
  - Percentage of continents corresponding the 10 lowest countries in protein consumption 1961  
  - Continents corresponding the 10 lowest countries in protein consumption 2013  
  - Percentage of continents corresponding the 10 lowest countries in protein consumption 2013  

#### Connection to DB

In [15]:
import mysql.connector
%load_ext sql
%sql mysql+mysqldb://root:admin@localhost/food_stat

The sql extension is already loaded. To reload it, use:
  %reload_ext sql


#### 10 Highest countries in protein consumption (Year 1961, 1st year of data)

In [6]:
%%sql
SELECT country, protein_supply_quantity
FROM protein_supply
WHERE year = 1961 and code is NOT NULL
ORDER BY protein_supply_quantity DESC
LIMIT 10

 * mysql+mysqldb://root:***@localhost/food_stat
10 rows affected.


country,protein_supply_quantity
Iceland,120.58
Argentina,105.72
Australia,104.89
France,103.0
Ireland,101.88
Poland,96.52
New Zealand,96.3
USSR,95.9
Finland,95.53
United States,95.21


#### 10 Lowest countries in protein consumption (Year 1961, 1st year of data)

In [11]:
%%sql
SELECT country, protein_supply_quantity
FROM protein_supply
WHERE year = 1961 and code is NOT NULL
ORDER BY protein_supply_quantity ASC
LIMIT 10;

 * mysql+mysqldb://root:***@localhost/food_stat
10 rows affected.


country,protein_supply_quantity
Maldives,30.53
Guinea-Bissau,31.53
Central African Republic,32.78
Mozambique,33.73
Saint Kitts and Nevis,33.83
Congo,35.03
Indonesia,35.22
Angola,36.61
Myanmar,37.46
Dominican Republic,37.54


#### 10 Lowest countries in protein consumption (Year 2013, last year of data)

In [9]:
%%sql
SELECT country, protein_supply_quantity
FROM protein_supply
WHERE year = 2013 and code is NOT NULL
ORDER BY protein_supply_quantity ASC
LIMIT 10;

 * mysql+mysqldb://root:***@localhost/food_stat
10 rows affected.


country,protein_supply_quantity
Liberia,37.66
Guinea-Bissau,44.04
Mozambique,45.7
Central African Republic,46.06
Madagascar,46.67
Haiti,47.72
Zimbabwe,48.35
Congo,51.66
Uganda,52.68
Sao Tome and Principe,53.08


#### 10 Highest countries in protein consumption (Year 2013, last year of data)

In [10]:
%%sql
SELECT country, protein_supply_quantity
FROM protein_supply
WHERE year = 2013 and code is NOT NULL
ORDER BY protein_supply_quantity DESC
LIMIT 10;

 * mysql+mysqldb://root:***@localhost/food_stat
10 rows affected.


country,protein_supply_quantity
Iceland,133.54
Israel,128.14
Lithuania,124.49
Maldives,122.43
Finland,117.72
Luxembourg,113.88
Montenegro,112.07
Netherlands,111.72
Albania,111.42
Norway,110.9


#### Continents representing the 10 lowest countries in protein consumption in 1961

Due to continent is only on a random row of each country and to simplify following queries we'll create a view.

##### Step 1: Create View

In [31]:
%%sql
CREATE OR REPLACE VIEW country_continent AS
SELECT country, continent
FROM food_stat.share_food_expenditure
WHERE continent is NOT NULL

 * mysql+mysqldb://root:***@localhost/food_stat
0 rows affected.


[]

In [17]:
%%sql
SELECT *
FROM country_continent
LIMIT 10

 * mysql+mysqldb://root:***@localhost/food_stat
10 rows affected.


country,continent
Abkhazia,Asia
Afghanistan,Asia
Akrotiri and Dhekelia,Asia
Albania,Europe
Algeria,Africa
American Samoa,Oceania
Andorra,Europe
Angola,Africa
Anguilla,North America
Antarctica,Antarctica


#### Step 2: Query

In [20]:
%%sql
SELECT continent, count(*)  
FROM (
    SELECT t2.continent
    FROM food_stat.protein_supply as t1
    JOIN country_continent as t2
    ON (t1.country = t2.country)
    WHERE t1.code is NOT NULL and year = 1961
    ORDER BY t1.protein_supply_quantity ASC LIMIT 10) AS Subquery
GROUP BY continent

 * mysql+mysqldb://root:***@localhost/food_stat
3 rows affected.


continent,count(*)
Africa,5
Asia,3
North America,2


#### Percentage per continent (Continents 10 lowest countries in protein consumption in 1961)

In [26]:
%%sql
SELECT continent,
       CONCAT(ROUND(count(*) * 100 /
                      (SELECT count(*)
                       FROM
                         (SELECT t2.continent
                          FROM food_stat.protein_supply AS t1
                          JOIN country_continent AS t2 ON (t1.country = t2.country)
                          WHERE t1.code IS NOT NULL
                            AND YEAR = 1961
                          ORDER BY t1.protein_supply_quantity ASC
                          LIMIT 10) AS Subquery)), '%') AS percentage
FROM
  (SELECT t2.continent
   FROM food_stat.protein_supply AS t1
   JOIN country_continent AS t2 ON (t1.country = t2.country)
   WHERE t1.code IS NOT NULL
     AND YEAR = 1961
   ORDER BY t1.protein_supply_quantity ASC
   LIMIT 10) AS Subquery
GROUP BY continent;

 * mysql+mysqldb://root:***@localhost/food_stat
3 rows affected.


continent,percentage
Africa,50%
Asia,30%
North America,20%


#### Continents representing the 10 lowest countries in protein consumption in 2013

In [27]:
%%sql
SELECT continent,
       count(*)
FROM
  (SELECT t2.continent
   FROM food_stat.protein_supply AS t1
   JOIN country_continent AS t2 ON (t1.country = t2.country)
   WHERE t1.code IS NOT NULL
     AND YEAR = 2013
   ORDER BY t1.protein_supply_quantity ASC
   LIMIT 10) AS Subquery
GROUP BY continent

 * mysql+mysqldb://root:***@localhost/food_stat
2 rows affected.


continent,count(*)
Africa,9
North America,1


#### Percentage per continent (Continents 10 lowest countries in protein consumption in 2013)

In [33]:
%%sql
SELECT continent,
       CONCAT(ROUND(count(*) * 100 /
                      (SELECT count(*)
                       FROM
                         (SELECT t2.continent
                          FROM food_stat.protein_supply AS t1
                          JOIN country_continent AS t2 ON (t1.country = t2.country)
                          WHERE t1.code IS NOT NULL
                            AND YEAR = 2013
                          ORDER BY t1.protein_supply_quantity ASC
                          LIMIT 10) AS Subquery)), '%') AS percentage
FROM
  (SELECT t2.continent
   FROM food_stat.protein_supply AS t1
   JOIN country_continent AS t2 ON (t1.country = t2.country)
   WHERE t1.code IS NOT NULL
     AND YEAR = 2013
   ORDER BY t1.protein_supply_quantity ASC
   LIMIT 10) AS Subquery
GROUP BY continent;

 * mysql+mysqldb://root:***@localhost/food_stat
2 rows affected.


continent,percentage
Africa,90%
North America,10%


#### Conclusion

- The first part of the protein consumption analysis compared the consumption in 1961 (first year of the data) and 2013 (last year with data).


- In 1961 the countries with the highest consumption were Iceland, Argentina and Australia. The countries with the lower consumption were Maldives, Guinea-Bissau and Central African Republic.


- In 2013 the countries with the highest consumption were Iceland, Israel and Lithuania. The countries with the lower consumption were Liberia, Guinea-Bissau and Mozambique.


- There are some points to note in these statistics. First, despite the gap of 52 years from the first to the last year of data, Iceland keeps the first place in protein consumption. In the same way Guinea-Bissau kept the second lowest consumption of protein. Finally, Maldives went from being the country with the lowest consumption in 1961 to the top 10 in highest consumption of protein in 2013.


- There was a global increase in protein consumption. The lowest consumption in 1961 was 30,53 gr/day and the lowest in 2013 37,66 gr/day. The highest data in consumption also increased, from 120,58 gr/day in 1961 to 133,54 gr/day in 2013.


- There was a change in the distribution of continents regarding the top ten lowest consumption of protein among the 52 year gap. In 1961 the lowest consumption was distributed among Africa (50%), Asia (30%) and America (20%). In 2013 it's focused mainly in Africa with 90%.