# {Project Title}üìù

![Banner](./assets/banner.jpeg)

## Topic
*What problem are you (or your stakeholder) trying to address?*
üìù <!-- Answer Below -->

The problem is to identify the best places on earth for astronomical viewing.

‚ÄúLight pollution limits the visibility of [the] milky way to the unaided eye, the [visibility] of nebulae and galaxies seen in telescopes, and raises the noise on CCD astrophotographs. Only the observation of planets and double stars is unaffected. Low light pollution conditions, or dark skies, is one of the most important properties of a good [astronomical] observing site.‚Äù Attilla Danko

With the expansion of industrial areas and human settlements, the amount of light and air pollution across Earth is increasing, so the areas with good astronomical observing are dwindling even assuming consistent weather patterns. Therefore, I would like to analyze several variables across Earth‚Äôs surface that impact astronomical viewing and identify the best regions. Amateur and professional astronomers can make the best observations in those regions and potentially petition for dark sky preserves in those regions to ensure future generations can view the cosmos in their full beauty. 


## Project Question
*What specific question are you seeking to answer with this project?*
*This is not the same as the questions you ask to limit the scope of the project.*
üìù <!-- Answer Below -->

What regions of Earth have the best combination of environmental variables for astronomical viewing? 

Of those regions, which ones are closest to large populations of people? Dark sites would not be useful if the vast majority of people have to travel far to reach them. 


## What would an answer look like?
*What is your hypothesized answer to your question?*
üìù <!-- Answer Below -->

The answer will look like a world map with regions highlighted by a color scale according to how optimal the regions are for astronomical observing. The primary variable is light pollution, so the final map with all variables factored in will look something like this:

https://djlorenz.github.io/astronomy/lp2024/world2024_low3.png


In [1]:
import pandas as pd
import numpy as np
import matplotlib as mpl

import requests
from bs4 import BeautifulSoup
from io import StringIO

## Data Sources
*What 3 data sources have you identified for this project?*
*How are you going to relate these datasets?*
üìù <!-- Answer Below -->

Data:

https://www.kaggle.com/datasets/sazidthe1/global-air-pollution-data

https://gco.iarc.fr/today/en/dataviz/maps-prevalence?mode=population&age_end=17&age_start=0&options_indicator=%5Bobject%20Object%5D_%5Bobject%20Object%5D&types=2&cancers=40

https://vizhub.healthdata.org/gbd-results/

https://en.wikipedia.org/wiki/List_of_countries_by_life_expectancy

https://ourworldindata.org/grapher/coal-consumption-by-country-terawatt-hours-twh
Data sources: Energy Institute - Statistical Review of World Energy (2025) ‚Äì with major processing by Our World in Data


Relating the data sets:
Plot each data set against air pollution to look for patterns and correlations (linear or polynomial regression)


## Approach and Analysis
*What is your approach to answering your project question?*
*How will you use the identified data to answer your project question?*
üìù <!-- Start Discussing the project here; you can add as many code cells as you need -->

In [8]:
# Fetch the data.
coal_df = pd.read_csv("https://ourworldindata.org/grapher/coal-consumption-by-country-terawatt-hours-twh.csv?v=1&csvType=full&useColumnShortNames=true", 
                 storage_options = {'User-Agent': 'Our World In Data data fetch/1.0'})
# Fetch the metadata
#metadata = requests.get("https://ourworldindata.org/grapher/coal-consumption-by-country-terawatt-hours-twh.metadata.json?v=1&csvType=full&useColumnShortNames=true").json()

In [9]:
coal_df.sample(5)

Unnamed: 0,Entity,Code,Year,coal_consumption_twh
4490,Peru,PER,2005,10.716667
2124,Hong Kong,HKG,2014,94.72611
5353,Spain,ESP,2013,132.54712
4981,Slovakia,SVK,1976,78.09544
4290,Other South America (EI),,1985,0.044388


In [3]:
url = 'https://en.wikipedia.org/wiki/List_of_countries_by_life_expectancy'
headers = {'User-Agent': 'PyRequests/2.14'}
page = requests.get(url, headers=headers)
#print(page.content)

In [6]:
soup = BeautifulSoup(page.content,'html.parser')
tables = soup.find_all('table')
table_IO = StringIO(str(tables[1]))
life_expect_df = pd.read_html(table_IO)[0]

In [10]:
life_expect_df.sample(5)

Unnamed: 0_level_0,Locations,Life expectancy overall,Life expectancy overall,Life expectancy overall,Life expectancy overall,Life expectancy overall,Life expectancy overall,Life expectancy overall,Male,Male,Male,Male,Female,Female,Female,Female,Sex gap,Sex gap,Sex gap,Sex gap,Unnamed: 20_level_0
Unnamed: 0_level_1,Locations,at birth,bonus 0‚Üí15,at 15,bonus 15‚Üí65,at 65,bonus 65‚Üí80,at 80,at birth,at 15,...,at 80,at birth,at 15,at 65,at 80,at birth,at 15,at 65,at 80,Unnamed: 20_level_1
43,Isle of Man,81.0,1.31,67.31,3.25,20.56,4.21,9.78,78.93,65.29,...,8.84,83.13,69.38,21.93,10.55,4.21,4.08,2.77,1.71,
191,Kenya,63.65,3.18,51.83,12.19,14.02,7.94,6.96,61.46,49.74,...,6.31,65.92,53.99,14.97,7.34,4.46,4.24,2.07,1.03,
61,Uruguay,78.14,0.61,63.75,5.19,18.94,5.9,9.84,74.19,59.82,...,7.72,81.92,67.51,21.31,11.06,7.73,7.69,5.29,3.34,
12,Norway,83.31,0.25,68.56,2.19,20.75,3.53,9.28,81.75,67.02,...,8.37,84.85,70.07,21.84,10.02,3.1,3.05,2.27,1.65,
116,Trinidad and Tobago,73.49,1.36,59.85,6.49,16.34,5.54,6.89,70.38,56.81,...,6.18,76.68,62.96,17.61,7.32,6.3,6.14,2.77,1.14,


## Resources and References
*What resources and references have you used for this project?*
üìù <!-- Answer Below -->

In [2]:
# ‚ö†Ô∏è Make sure you run this cell at the end of your notebook before every submission!
!jupyter nbconvert --to python source.ipynb

[NbConvertApp] Converting notebook source.ipynb to python
[NbConvertApp] Writing 1271 bytes to source.py
