In [1]:
import requests
import pymongo

### Info on dataset and source
[link to source on energydata.info website!](https://energydata.info/dataset/mena-energy-indicators-2017)

"Various indicators for MENA countries to get an overview of the countries energy and economic profiles. The indicators are organized in 6 categories: economic indicators, energy indicators, oil indicators, gas indicators, electricity indicators & energy efficiency indicators. Each indicator includes name, unit, year, and value for that year. Main sources are the World Bank Group, the IMF, KNOEMA aggregating platform and EIA."

In [2]:
mena_resp = requests.get('https://development-data-hub-s3-public.s3.amazonaws.com/ddhfiles/145369/pa-retp-indicators.json')
mena_json = mena_resp.json()
mena_resp

<Response [200]>

In [3]:
# list all countries found in this json list
[mena_json['countries'][country]['name'] for country in range(len(mena_json['countries']))]

['algeria',
 'bahrain',
 'egypt',
 'iraq',
 'jordan',
 'kuwait',
 'lebanon',
 'libya',
 'morocco',
 'oman',
 'qatar',
 'saudi arabia',
 'syria',
 'tunisia',
 'uae',
 'west bank and gaza',
 'yemen']

In [4]:
# show all data categories for
[item['type'] for item in mena_json['countries'][0]['subject']]

['economic indicators',
 'energy indicators',
 'oil indicators',
 'gas indicators',
 'electricity indicators',
 'energy efficiency indicators']

In [5]:
# item 0: economic indicators
    # 3 is gdp per capita
    # 5 is pop
    # 7 is HDI
    # 13 is co2 em itensity
    # 14 is co2 per capita
econ_indics = [3, 5, 7, 13, 14]
[mena_json['countries'][0]['subject'][0]['indicator'][sec] for sec in econ_indics]

[{'name': 'GDP per capita',
  'unit': 'US $',
  'values': [{'year': 2010, 'value': 4481},
   {'year': 2011, 'value': 5431},
   {'year': 2012, 'value': 5574},
   {'year': 2013, 'value': 5476},
   {'year': 2014, 'value': 5459},
   {'year': 2015, 'value': 4175},
   {'year': 2016, 'value': 4129},
   {'year': 2017, 'value': ''}]},
 {'name': 'Population',
  'unit': 'million',
  'values': [{'year': 2010, 'value': 36.04},
   {'year': 2011, 'value': 36.72},
   {'year': 2012, 'value': 37.44},
   {'year': 2013, 'value': 38.19},
   {'year': 2014, 'value': 38.93},
   {'year': 2015, 'value': 39.67},
   {'year': 2016, 'value': 40.38},
   {'year': 2017, 'value': ''}]},
 {'name': 'Human Development Index',
  'unit': '',
  'values': [{'year': 2010, 'value': 0.71},
   {'year': 2011, 'value': 0.72},
   {'year': 2012, 'value': 0.72},
   {'year': 2013, 'value': 0.72},
   {'year': 2014, 'value': 0.74},
   {'year': 2015, 'value': ''},
   {'year': 2016, 'value': ''},
   {'year': 2017, 'value': ''}]},
 {'name':

In [6]:
# item 1: energy indicators
    # 1 is gross energy production
    # 2-4 are fossil fuel prod
    # 5 is nuclear
    # 6 is gross eprod renewables
    # 7-10 are solar, wind, hydro, other energy prod
[mena_json['countries'][0]['subject'][1]['indicator'][sec] for sec in range(10)]

[{'name': 'Gross energy production',
  'unit': 'ktoe',
  'values': [{'year': 2010, 'value': 150510.895},
   {'year': 2011, 'value': 145832.565},
   {'year': 2012, 'value': 143763.68},
   {'year': 2013, 'value': 137669.492},
   {'year': 2014, 'value': 143197.3},
   {'year': 2015, 'value': ''},
   {'year': 2016, 'value': ''},
   {'year': 2017, 'value': ''}]},
 {'name': 'Gross energy production - Crude Oil and Oil Products',
  'unit': '%',
  'values': [{'year': 2010, 'value': 52.15272223316458},
   {'year': 2011, 'value': 52.25005539743472},
   {'year': 2012, 'value': 49.515179355453334},
   {'year': 2013, 'value': 49.92558990484253},
   {'year': 2014, 'value': 50.96184564932439},
   {'year': 2015, 'value': ''},
   {'year': 2016, 'value': ''},
   {'year': 2017, 'value': ''}]},
 {'name': 'Gross energy production - Natural Gas',
  'unit': '%',
  'values': [{'year': 2010, 'value': 47.80292217383998},
   {'year': 2011, 'value': 47.70930415987677},
   {'year': 2012, 'value': 50.43678208571178}

In [7]:
# item 4: electricity indicators
    # 8 is kWh consumption per capita
    # 9 is cost of elec per kWh
[mena_json['countries'][0]['subject'][4]['indicator'][sec] for sec in (8, 9)]

[{'name': 'Cost of electricity',
  'unit': '$/kWh',
  'values': [{'year': 2010, 'value': ''},
   {'year': 2011, 'value': ''},
   {'year': 2012, 'value': 0.2016},
   {'year': 2013, 'value': 0.2157},
   {'year': 2014, 'value': 0.1984},
   {'year': 2015, 'value': 0.1563},
   {'year': 2016, 'value': 0.1184},
   {'year': 2017, 'value': ''}]},
 {'name': 'Electricity price',
  'unit': '$/kWh',
  'values': [{'year': 2010, 'value': ''},
   {'year': 2011, 'value': ''},
   {'year': 2012, 'value': 0.03868},
   {'year': 2013, 'value': 0.04012},
   {'year': 2014, 'value': 0.04014},
   {'year': 2015, 'value': 0.03229},
   {'year': 2016, 'value': 0.03964},
   {'year': 2017, 'value': ''}]}]

## Restructuring dataset to something mongo will accept
"Operation passed in cannot be an Array"

In [8]:
# connect to db for this assignment
client = pymongo.MongoClient("mongodb://localhost:27017/")
db = client.hw3
# create collection for country docs
countries = client.countries
# add each country doc to the collection
for country in mena_json['countries']:
    db.countries.insert_one(country)
    print(f'inserted {country["name"]} into collection.')

inserted algeria into collection.
inserted bahrain into collection.
inserted egypt into collection.
inserted iraq into collection.
inserted jordan into collection.
inserted kuwait into collection.
inserted lebanon into collection.
inserted libya into collection.
inserted morocco into collection.
inserted oman into collection.
inserted qatar into collection.
inserted saudi arabia into collection.
inserted syria into collection.
inserted tunisia into collection.
inserted uae into collection.
inserted west bank and gaza into collection.
inserted yemen into collection.


## Exploring the data: motivating questions
1. Which countries in the dataset have the greatest percentage of their total energy output as solar energy?

In [9]:
mena_json['countries'][0]['subject'][1]['indicator'][6]

{'name': 'Gross energy production - Solar',
 'unit': '%',
 'values': [{'year': 2010, 'value': 0},
  {'year': 2011, 'value': 0},
  {'year': 2012, 'value': 0},
  {'year': 2013, 'value': 0},
  {'year': 2014, 'value': 0},
  {'year': 2015, 'value': ''},
  {'year': 2016, 'value': ''},
  {'year': 2017, 'value': ''}]}

In [10]:
# 1. Which countries in the dataset have the most solar energy output?
# drill down to get solar amounts per country!
most_solar = db.countries.aggregate([
    {
        # expand elements of subject list for each country
        "$unwind": "$subject"
    },
    {
        # extract 'energy indicators' subject
        "$match": {
            "subject.type": "energy indicators"
        }
    },
    {
        # expand elements of indicator list for this subject type
        "$unwind": "$subject.indicator"
    },
    {
        # extract 'Gross energy production - Solar' indicator
        "$match": {
            "subject.indicator.name": "Gross energy production - Solar"
        }
    },
    {
        # expand elements of year list for this indicator
        "$unwind": "$subject.indicator.values"
    },
    {
        # exclude years with empty string as value
        "$match": {"subject.indicator.values.value": {"$ne": ""}}
    },
    {
        # for each country, get the max value for annual solar energy production
        "$group": {
            "_id": "$name",
            "maxSolarProduction": {"$max": "$subject.indicator.values.value"}
        }
    },
    {
        "$sort": {
            "maxSolarProduction": -1
        }
    }
])

[country for country in most_solar]

[{'_id': 'jordan', 'maxSolarProduction': 58.33045692677052},
 {'_id': 'lebanon', 'maxSolarProduction': 15.153580203689925},
 {'_id': 'tunisia', 'maxSolarProduction': 0.6787589293911857},
 {'_id': 'uae', 'maxSolarProduction': 0.034140968315127765},
 {'_id': 'egypt', 'maxSolarProduction': 0.026434563845563263},
 {'_id': 'saudi arabia', 'maxSolarProduction': 1.3995593650078733e-05},
 {'_id': 'oman', 'maxSolarProduction': 0},
 {'_id': 'iraq', 'maxSolarProduction': 0},
 {'_id': 'yemen', 'maxSolarProduction': 0},
 {'_id': 'bahrain', 'maxSolarProduction': 0},
 {'_id': 'algeria', 'maxSolarProduction': 0},
 {'_id': 'libya', 'maxSolarProduction': 0},
 {'_id': 'qatar', 'maxSolarProduction': 0},
 {'_id': 'morocco', 'maxSolarProduction': 0},
 {'_id': 'kuwait', 'maxSolarProduction': 0},
 {'_id': 'syria', 'maxSolarProduction': 0}]


2. Which countries in the dataset have the lowest energy usage per capita?
- note: TOE is "tonne of oil equivalent" and the figures are per capita per year

In [11]:
mena_json['countries'][0]['subject'][1]['indicator'][30]

{'name': 'Energy use per capita',
 'unit': 'toe',
 'values': [{'year': 2010, 'value': 0.7358203107658158},
  {'year': 2011, 'value': 0.7528700163398693},
  {'year': 2012, 'value': 0.8144358974358974},
  {'year': 2013, 'value': 0.8351275202932705},
  {'year': 2014, 'value': 0.8995416388389418},
  {'year': 2015, 'value': ''},
  {'year': 2016, 'value': ''},
  {'year': 2017, 'value': ''}]}

In [12]:
least_usage_pc = db.countries.aggregate([
    {
        # expand elements of subject list for each country
        "$unwind": "$subject"
    },
    {
        # extract 'energy indicators' subject
        "$match": {
            "subject.type": "energy indicators"
        }
    },
    {
        # expand elements of indicator list for this subject type
        "$unwind": "$subject.indicator"
    },
    {
        # extract 'Gross energy production - Solar' indicator
        "$match": {
            "subject.indicator.name": "Energy use per capita"
        }
    },
    {
        # expand elements of year list for this indicator
        "$unwind": "$subject.indicator.values"
    },
    {
        # exclude years with empty string as value
        "$match": {"subject.indicator.values.value": {"$ne": ""}}
    },
    {
        # for each country, get the max value for annual solar energy production
        "$group": {
            "_id": "$name",
            "toePerCapita": {"$max": "$subject.indicator.values.value"}
        }
    },
    {
        "$sort": {
            "toePerCapita": 1
        }
    }
])

[country for country in least_usage_pc]

[{'_id': 'yemen', 'toePerCapita': 0.23276952604778692},
 {'_id': 'morocco', 'toePerCapita': 0.43197437841115827},
 {'_id': 'syria', 'toePerCapita': 0.6535419401544401},
 {'_id': 'egypt', 'toePerCapita': 0.660383410908223},
 {'_id': 'tunisia', 'toePerCapita': 0.70396},
 {'_id': 'jordan', 'toePerCapita': 0.7496350500715308},
 {'_id': 'iraq', 'toePerCapita': 0.7868724211165048},
 {'_id': 'lebanon', 'toePerCapita': 0.8898365853658536},
 {'_id': 'algeria', 'toePerCapita': 0.8995416388389418},
 {'_id': 'libya', 'toePerCapita': 2.1118398724082934},
 {'_id': 'bahrain', 'toePerCapita': 4.495690161527166},
 {'_id': 'saudi arabia', 'toePerCapita': 4.586784525736484},
 {'_id': 'oman', 'toePerCapita': 4.768878873239437},
 {'_id': 'kuwait', 'toePerCapita': 4.996120760233918},
 {'_id': 'uae', 'toePerCapita': 5.558555370985603},
 {'_id': 'qatar', 'toePerCapita': 8.554577419354839}]

3. Which countries have the most diverse range of energy generation solutions?

In [13]:
[indicator['name'] for indicator in mena_json['countries'][0]['subject'][1]['indicator']][:10]

['Gross energy production',
 'Gross energy production - Crude Oil and Oil Products',
 'Gross energy production - Natural Gas',
 'Gross energy production - Coal',
 'Gross energy production - Nuclear',
 'Gross energy production - Renewables',
 'Gross energy production - Solar',
 'Gross energy production - Wind',
 'Gross energy production - Hydropower',
 'Gross energy production - Other']

In [14]:
most_diverse_renewables = db.countries.aggregate([
    {
        # expand elements of subject list for each country
        "$unwind": "$subject"
    },
    {
        # extract 'energy indicators' subject
        "$match": {
            "subject.type": "energy indicators"
        }
    },
    {
        # expand elements of indicator list for this subject type
        "$unwind": "$subject.indicator"
    },
    {
        # extract 'Gross energy production - Solar' indicator
        "$match": {
            "subject.indicator.name": {"$regex": "^Gross energy production -*"}
        }
    },
    {
        # expand elements of year list for this indicator
        "$unwind": "$subject.indicator.values"
    },
    {
        # exclude years with empty string as value
        "$match": {"subject.indicator.values.value": {"$ne": ""}}
        # and-ing didn't work: {"$and": [{"$ne": ""}, {"$ne": 0}]}
    },
    {
        # exclude years with empty string as value
        "$match": {"subject.indicator.values.value": {"$ne": 0}}
    },
    {
        # rewind all non-zero, non-empty energy prod methods for each country
        "$group": {
            "_id": {
                "country": "$name",
                "energy production method": "$subject.indicator.name"
            },
            "energyGenerator": {"$first": "$subject.indicator.name"}
        }
    },

    {
        # for each country, list the unique energy production methods (using self-defined agg fields above)
        "$group": {
            "_id": "$_id.country",
            "energyGenerators": {"$push": "$energyGenerator"}
        }
    },
])

# TODO if i have extra time: rank methods by output and include numbers for each
[country for country in most_diverse_renewables]

[{'_id': 'tunisia',
  'energyGenerators': ['Gross energy production - Solar',
   'Gross energy production - Wind',
   'Gross energy production - Natural Gas',
   'Gross energy production - Renewables',
   'Gross energy production - Hydropower',
   'Gross energy production - Crude Oil and Oil Products',
   'Gross energy production - Other']},
 {'_id': 'bahrain',
  'energyGenerators': ['Gross energy production - Other',
   'Gross energy production - Crude Oil and Oil Products',
   'Gross energy production - Natural Gas']},
 {'_id': 'algeria',
  'energyGenerators': ['Gross energy production - Crude Oil and Oil Products',
   'Gross energy production - Natural Gas',
   'Gross energy production - Hydropower',
   'Gross energy production - Renewables',
   'Gross energy production - Other']},
 {'_id': 'qatar',
  'energyGenerators': ['Gross energy production - Natural Gas',
   'Gross energy production - Other',
   'Gross energy production - Crude Oil and Oil Products']},
 {'_id': 'iraq',
  'ene

4. Which countries have the lowest average electricty cost in 2012-2016? (note: can't divide by 5 because there's missing data, need to drop zeros and get mean of results)


5. which countries experienced the greatest YOY increase in renewables?
6. Which countries have the greatest ratio of renewable energy production to crude oil production?
7. Which countries have not invested in solar at all?
8. Which year saw the greatest average increase in solar in these countries?
9. Which year saw the greatest average decrease in cost per kWh in these countries?
aggregations, filtering, sorting, etc.
10. Which countries are producing the most fossil fuels per capita (annual ouput divided by population)?

## Datavis question
1. what's the relationship between co2 per capita and GDP per capita? add point for each country on a 2-axis plot

economic indicators ->
'name': 'CO2 emissions intensity',
'unit': 'kg/$1000 of GDP',


Step 4. Programmatic Visualization. For at least one of your questions, write a program to query Mongo
programmatically. I recommend PyMongo for this purpose, but you can use any language and libraries you like.
(PyMongo is nice because JSON documents are returned as lists of dictionaries that are very easy to process.)
Generate a visualization / plot of your result and provide an interpretation of your output, i.e., what does the
visualization reveal about your dataset? Try to highlight something non-obvious.

WHAT TO SUBMIT:
1. A copy of your JSON-formatted dataset from Step 1 (.zipped)
2. Questions, queries, and output that are the basis of your tutorial from Step 3.
Please submit this in .PDF format â€“ not a word-document.
3. Code and your visualization for Step 4. Please format your figure as a .pdf, .png, or .jpg file.