# Data analysis with Numpy


Read the world_alcohol.csv dataset in the world_alcohol variable (see previous section if necessary)


#### Mission:
- Extract the 3rd column from world_alcohol and compare it to the country "Canada". Assign the result to the variable countries_canada.
- Extract the first column from world_alcohol and compare it to the character string "1984". Assign the result to the variable years_1984.


In [4]:
from io import StringIO
import numpy as np

world_alcohol = np.genfromtxt("world_alcohol.csv", delimiter=",", dtype=str)


In [4]:
countries = world_alcohol[:,2]
countries_canada = world_alcohol[(countries == "Canada")]
countries_canada

array([['1984', 'Americas', 'Canada', 'Spirits', '3.35'],
       ['1989', 'Americas', 'Canada', 'Wine', '1.27'],
       ['1984', 'Americas', 'Canada', 'Beer', '5'],
       ['1985', 'Americas', 'Canada', 'Beer', '4.94'],
       ['1987', 'Americas', 'Canada', 'Wine', '1.3'],
       ['1987', 'Americas', 'Canada', 'Beer', '4.83'],
       ['1986', 'Americas', 'Canada', 'Other', ''],
       ['1986', 'Americas', 'Canada', 'Spirits', '3.11'],
       ['1985', 'Americas', 'Canada', 'Spirits', '3.21'],
       ['1985', 'Americas', 'Canada', 'Other', ''],
       ['1986', 'Americas', 'Canada', 'Beer', '4.87'],
       ['1984', 'Americas', 'Canada', 'Wine', '1.24'],
       ['1989', 'Americas', 'Canada', 'Spirits', '2.91'],
       ['1984', 'Americas', 'Canada', 'Other', ''],
       ['1985', 'Americas', 'Canada', 'Wine', '1.29'],
       ['1987', 'Americas', 'Canada', 'Spirits', '2.99'],
       ['1989', 'Americas', 'Canada', 'Beer', '4.82'],
       ['1989', 'Americas', 'Canada', 'Other', ''],
       ['19

In [6]:
year = world_alcohol[:,0]
years_1984 = world_alcohol[(year == "1984")]
years_1984

array([['1984', 'Africa', 'Nigeria', 'Other', '6.1'],
       ['1984', 'Eastern Mediterranean', 'Afghanistan', 'Other', '0'],
       ['1984', 'Americas', 'Costa Rica', 'Wine', '0.06'],
       ...,
       ['1984', 'Europe', 'Latvia', 'Spirits', '7.5'],
       ['1984', 'Africa', 'Angola', 'Wine', '0.57'],
       ['1984', 'Africa', 'Central African Republic', 'Wine', '0.46']],
      dtype='<U52')

## Select items


- Compare the 3rd column of world_alcohol to the character string "Algeria".
- Assign the result to the variable country_is_algeria.
- Select only the lines of world_alcohol for which country_is_algeria is True
- Assign the result to the country_algeria variable.
- Display the results
- Do the same work to recover all the lines corresponding to the year "1984". Assign the result to the variable years_1984.


In [7]:
countries1 = world_alcohol[:,2]
country_is_algeria = world_alcohol[(countries1 == "Algeria")]
country_is_algeria

array([['1984', 'Africa', 'Algeria', 'Spirits', '0.01'],
       ['1987', 'Africa', 'Algeria', 'Beer', '0.17'],
       ['1987', 'Africa', 'Algeria', 'Spirits', '0.01'],
       ['1986', 'Africa', 'Algeria', 'Wine', '0.1'],
       ['1984', 'Africa', 'Algeria', 'Other', '0'],
       ['1989', 'Africa', 'Algeria', 'Beer', '0.16'],
       ['1989', 'Africa', 'Algeria', 'Spirits', '0.01'],
       ['1989', 'Africa', 'Algeria', 'Wine', '0.23'],
       ['1986', 'Africa', 'Algeria', 'Spirits', '0.01'],
       ['1984', 'Africa', 'Algeria', 'Wine', '0.12'],
       ['1985', 'Africa', 'Algeria', 'Beer', '0.19'],
       ['1985', 'Africa', 'Algeria', 'Other', '0'],
       ['1986', 'Africa', 'Algeria', 'Beer', '0.18'],
       ['1985', 'Africa', 'Algeria', 'Wine', '0.11'],
       ['1986', 'Africa', 'Algeria', 'Other', '0'],
       ['1989', 'Africa', 'Algeria', 'Other', '0'],
       ['1987', 'Africa', 'Algeria', 'Other', '0'],
       ['1984', 'Africa', 'Algeria', 'Beer', '0.2'],
       ['1985', 'Africa', 'A

In [8]:
year = world_alcohol[:,0]
years_1984 = world_alcohol[(year == "1984")]
years_1984

array([['1984', 'Africa', 'Nigeria', 'Other', '6.1'],
       ['1984', 'Eastern Mediterranean', 'Afghanistan', 'Other', '0'],
       ['1984', 'Americas', 'Costa Rica', 'Wine', '0.06'],
       ...,
       ['1984', 'Europe', 'Latvia', 'Spirits', '7.5'],
       ['1984', 'Africa', 'Angola', 'Wine', '0.57'],
       ['1984', 'Africa', 'Central African Republic', 'Wine', '0.46']],
      dtype='<U52')

## Perform comparisons with several conditions


- Select the lines whose country is "Algeria" and the year "1986":
> - Create this double comparison and assign the result to the variable is_algeria_and_1986.
> - Use is_algeria_and_1986 to select the corresponding lines in the world_alcohol table.
> - Assign the result to the rows_with_algeria_and_1986 variable.
> - Display the result.


In [27]:
countries2 = world_alcohol[:,2]
country1 = (countries2 == "Algeria")
print(country1)
year1986 = world_alcohol[:,0]
year = (year1986 == "1984")
print(year)
is_algeria_and_1986 = np.all((country,year))
rows_with_algeria_and_1986 = world_alcohol(is_algeria_and_1986)
rows_with_algeria_and_1986

[False False False ... False False False]
[False False False ... False False False]


TypeError: 'numpy.ndarray' object is not callable

## Replace values ​​in a Numpy array


- Create an array numpy world_alcohol_2 equal world_alcohol to duplicate it under another name.
- Replace all the years "1986" in the first column of world_alcohol_2 with "2018".
- Replace all "Wine" alcohols in the 4th column of world_alcohol_2 with "Beer".


In [12]:

np.where(world_alcohol_2[:,0] == '1986', '2018', world_alcohol_2[:,0])

np.where()

TypeError: where() got an unexpected keyword argument 'inplace'

In [5]:
world_alcohol_2 = world_alcohol.copy()

In [11]:
world_alcohol_2[:,0]

array(['Year', '1986', '1986', ..., '1986', '1987', '1986'], dtype='<U52')

## Replace empty strings


- Compare all the elements of the 5th column of world_alcohol with the empty character string i.e. ''. Assign the result to the variable is_value_empty.
- Select all the values ​​in the 5th column of world_alcohol for which is_value_empty is equal to True and finally replace them with the character string '0'.


## Convert data types


- Extract the 5th column from world_alcohol and assign the result to the variable alcohol_consumption.
- Use the astype () method to convert alcohol_consumption to decimal (float).


## Perform mathematical calculations with Numpy


- Use the sum () method to calculate the sum of the values ​​of alcohol_consumption. Assign the result to the total_alcohol variable.
- Use the mean () method to calculate the average of the alcohol_consumption values. Assign the result to the variable average_alcohol.
- Show the results.


## Calculate the total annual consumption per capita for a given country


- Create a matrix which will be named canada_1986 which contains all the lines of world_alcohol corresponding to the year "1986" and to the country "Canada".
- Extract the 5th column from canada_1986, replace any empty character string ('') with '0' and convert the column to decimal (float). Assign the result to the variable canada_alcohol.
- Calculate the sum of canada_alcohol. Assign the result to the variable total_canadian_drinking.
- Show result.


## Calculate consumption for each country


- First of all, we create an empty dictionary which will contain all the countries and their associated alcohol consumption, we will call it totals.
- Then select the lines of world_alcohol corresponding to the given year, say 1989. Assign the result to the year variable.
- Select from a list which will be named countries all countries.
- Browse all the countries in the list using a loop. For each country:
> - Select the year lines corresponding to this country
> - Assign the result to the country_consumption variable
> - Extract the 5th column from country_consumption
> - Replace any empty character string in this column with 0
> - Convert the column to decimal (float)
> - Calculate the sum of the column
> - Add the sum to the totals dictionary, with the country name as the key and value this sum.
- Display the totals dictionary.


## Find the country that consumes the most alcohol


- Create a variable highest_value which will keep in memory the largest value of the totals dictionary. We set it to 0 to start.
- Create a similar variable which will be named highest_key which will keep in memory the name of the country associated with the highest value. We set it to None.
- Browse each country of the totals dictionary:
> - If the value associated with the country is greater than highest_value, assign the value in question to the variable highest_value and assign the corresponding key (name of the country) to the variable highest_key.
- Display the country which consumes the most alcohol (variable highest_key)
