![Callysto.ca Banner](https://github.com/callysto/curriculum-notebooks/blob/master/callysto-notebook-banner-top.jpg?raw=true)

<a href="https://hub.callysto.ca/jupyter/hub/user-redirect/git-pull?repo=https%3A%2F%2Fgithub.com%2Fcallysto%2Fcurriculum-notebooks&branch=master&subPath=SocialStudies/HansardAnalysis/hansard-analysis.ipynb&depth=1" target="_parent"><img src="https://raw.githubusercontent.com/callysto/curriculum-notebooks/master/open-in-callysto-button.svg?sanitize=true" width="123" height="24" alt="Open in Callysto"/></a>

# Callysto's Weekly Data Visualization


## Sunset and Sunrise

### Recommended Grade levels: 5-9

### Instructions

Click "Cell" and select "Run All".

This will import the data and run all the code, so you can see this week's data visualization. Scroll back to the top after you’ve run the cells.

![instructions](https://github.com/callysto/data-viz-of-the-week/blob/main/images/instructions.png?raw=true)

**You don't need to do any coding to view the visualizations**.

The plots generated in this notebook are interactive. You can hover over and click on elements to see more information. 

Email contact@callysto.ca if you experience issues.

### About this Notebook

Callysto's Weekly Data Visualization is a learning resource that aims to develop data literacy skills. We provide Grades 5-12 teachers and students with a data visualization, like a graph, to interpret. This companion resource walks learners through how the data visualization is created and interpreted by a data scientist. 

The steps of the data analysis process are listed below and applied to each weekly topic.

1. Question - What are we trying to answer?
2. Gather - Find the data source(s) you will need. 
3. Organize - Arrange the data, so that you can easily explore it. 
4. Explore - Examine the data to look for evidence to answer the question. This includes creating visualizations. 
5. Interpret - Describe what's happening in the data visualization. 
6. Communicate - Explain how the evidence answers the question. 

## Question

What are the sunset, sunrise, and day lengths of particular areas around the world?

### Goal

Our goal is to show the differences between sunrise, sunset, and day lengths around the world and how timezones can drastically these factors.

### Background

Welcome to the wonderful world of Sunrises and Sunsets! In this vibrant notebook, we embark on an breathtaking journey through the captivating beauty of nature's daily masterpieces, sunrises and sunsets. Have you ever wondered what the sunset times are in Calgary compared to Berlin? We will be exploring these differences throughout this notebook.

## Gather


Sunrise and sunset data was collected through the [Sunset and Sunrise](https://sunrise-sunset.org/api) API, which provides sunset and sunrise times when given a specific latitude and longitude. 

### Code: 

Run the code cells below to import the libraries we need for this project. Libraries are pre-made code that make it easier to analyze our data.

In [3]:
import requests
from datetime import *
import pandas as pd
import plotly.express as px
import json
print('Libraries imported')

Libraries imported


[Pandas](https://pandas.pydata.org/) is a library that helps us with data analysis, and [plotly express](https://plotly.com/python/plotly-express/) is a library that helps us to make visualizations. [Requests](https://requests.readthedocs.io/en/latest/) and [json](https://docs.python.org/3/library/json.html) help read the sunset and sunrise data from an external API. If you are not familiar with what an API is or does, it essentially sends information to one party from another. For example, many Twitter bots make use of the Twitter API where twitter sends information these bots may need in order to properly function. [Datetime](https://docs.python.org/3/library/datetime.html) supplies information in regards to manipulating dates and times and [tzfpy](https://pypi.org/project/tzfpy/) allows us to find timezone names by supplying longitude and latitude values. 

Without importing these libraries we would have to use much more code to analyze our data and generate visualizations. We import the libraries with abbreviations, or aliases, so that we have less typing to do in each line of our code below.

### Data
We are using data from the [Sunset and Sunrise](https://sunrise-sunset.org/api) API. Run the code below to populate the data into a dataframe.

#### Import the Data

Let's try inputting the _latitude_ and _longitude_ of Calgary and see what the resulting data is.

In [4]:
lat= 51.049999
lng= -114.06666

query = {'lat': lat, 'lng': lng, 'date': 'today'}
response = requests.get(url = 'https://api.sunrise-sunset.org/json?', params=query)

data = response.json()
print(f"Information about Calgary: {data}")
df = pd.json_normalize(data['results'])
display(df)

Information about Calgary: {'results': {'sunrise': '11:30:06 AM', 'sunset': '3:36:32 AM', 'solar_noon': '7:33:19 PM', 'day_length': '16:06:26', 'civil_twilight_begin': '10:48:48 AM', 'civil_twilight_end': '4:17:51 AM', 'nautical_twilight_begin': '9:47:13 AM', 'nautical_twilight_end': '5:19:26 AM', 'astronomical_twilight_begin': '12:00:01 AM', 'astronomical_twilight_end': '12:00:01 AM'}, 'status': 'OK'}


Unnamed: 0,sunrise,sunset,solar_noon,day_length,civil_twilight_begin,civil_twilight_end,nautical_twilight_begin,nautical_twilight_end,astronomical_twilight_begin,astronomical_twilight_end
0,11:30:06 AM,3:36:32 AM,7:33:19 PM,16:06:26,10:48:48 AM,4:17:51 AM,9:47:13 AM,5:19:26 AM,12:00:01 AM,12:00:01 AM


### Comment on the data

The dataframe above is a file structure that allows Python to display data in an easily readable format, similar to a spreadsheet. 

Looking at the data, we have multiple things to take note of. In particular, we see thaty we have access to the `sunrise` and `sunset` for a location at a particular _latitude_ and _longitude_. We also have access to `day_length` which highlights the length of a particular date. However, one particular issue with this dataset is that times are represented in the **UTC timezone**. Let's keep mind of this and _transform_ our data later.

In [5]:
sunrise = data['results']['sunrise']
sunset = data['results']['sunset']
print(f"Calgary's latitude of {lat} and longitude {lng}, has a corresponding sunrise and sunset times are {sunrise} and {sunset} in the UTC timezone.")

Calgary's latitude of 51.049999 and longitude -114.06666, has a corresponding sunrise and sunset times are 11:30:06 AM and 3:36:32 AM in the UTC timezone.


Imagine you're going on a trip around the world, and you want to know how the length of day and night changes depending on where you are. Well, it all has to do with two special lines called latitude and longitude.

Latitude is like a horizontal belt that runs around the Earth. The Equator is the most important line, dividing the Earth into the Northern and Southern Hemispheres.

Longitude is like a set of vertical lines that go from the North Pole to the South Pole. The Prime Meridian, which goes through Greenwich, London, is the main one.

When you're close to the middle of the Earth (the equator), daytime and nighttime are about the same length. But as you move away from the equator towards the North Pole or the South Pole, the days become longer and the nights become shorter. So, the further you are from the equator, the bigger the differences between daytime and nighttime.

## Organize
An important part of the data science process is cleaning up and organizing your data so it can be useful for finding observations. Part of cleaning involves 
- identifying missing data
- removing missing data
- ensuring the data is all in the same format
- identifying and dealing with outliers. 

In our particular case, we have two main things to keep in mind. One is that our data is currently in the **UTC timezone**. Timezones are super important when it comes to sunset and sunrise times! Imagine you're on an adventure with your friends in different parts of the world. Each place has its own special time. Timezones help us keep track of these different times and make sure everyone knows when the sun will rise and set in their area. You see, the Earth is a big globe, and as it spins around, different parts face the sun at different times. So, when it's morning in one place, it might still be night in another place far away. Timezones help us organize and understand these differences. 

The other issue we need to fix is converting our `sunrise` and `sunset` times from the _12-hour_ format into the _24-hour_ format. This is useful in a number of ways but we'll go more into depth when we solve the issue directly. 

With a quick search we find out that Calgary is currently in the MDT/MST timezone which is (GMT-6). If you're having a hard time finding out your own local timezone, use [this](https://www.timeanddate.com/worldclock/converter.html) site which helps find the differences between your timezone and UTC.


Using the site listed, we know that MDT/MST is 6 hours behind the **UTC timezone** meaning I have to **subtract** 6 hours from our obtained `sunrise` and `sunset` times in order for times to be in our local timezone. Therefore, I put **-6** in the `hours_to_switch` variable in the cell below.

In [6]:
# Difference in hours between UTC and MST - Change this value later on in the notebook to the timezone you'd like to swap to
offset_hours = -6

If your timezone was ahead of UTC, instead of subtracting a certain amount of hours, you would have to **add** a certain amount of hours. For example, if my timezone was 4 hours ahead of UTC, I would input **4** into the `hours_to_switch` variable.

In [14]:
data = {'location': ['Tokyo, Japan', 'London, United Kingdom', 'New York City, United States of America', 
                     'Dubai, United Arab Emirates', 'Hong Kong, China', 'Mumbai, India', 'Bangkok, Thailand',
                     'Paris, France', 'Istanbul, Turkey', 'Seoul, South Korea'],
        'latitude': [35.672855, 51.509865, 40.730610, 25.276987, 22.302711, 19.076090, 13.736717, 48.864716, 41.015137, 37.532600], 
        'longitude': [139.817413, -0.118092, -73.935242, 55.296249, 114.177216, 72.877426, 100.523184, 2.349014, 28.979530, 127.024612]}

places = pd.DataFrame(data=data)
display(places)

Unnamed: 0,location,latitude,longitude
0,"Tokyo, Japan",35.672855,139.817413
1,"London, United Kingdom",51.509865,-0.118092
2,"New York City, United States of America",40.73061,-73.935242
3,"Dubai, United Arab Emirates",25.276987,55.296249
4,"Hong Kong, China",22.302711,114.177216
5,"Mumbai, India",19.07609,72.877426
6,"Bangkok, Thailand",13.736717,100.523184
7,"Paris, France",48.864716,2.349014
8,"Istanbul, Turkey",41.015137,28.97953
9,"Seoul, South Korea",37.5326,127.024612


Listed above are some popular locations with their listed latitude and longitude values. Once again, you can find the timezones for these locations using an online [timezone converter](https://www.timeanddate.com/worldclock/converter.html).

Great, we now have a reliable way to get a timezone from inputted longitude and latitude values. We also need to convert our times from the _12-hour_ format to the _24-hour_ format alongside using our local timezone which we want to switch our `sunrise` and `sunset` times into.

In [8]:
def time_convert(time, timezone_swap:int):
    # Note, here you have to change the timezone information and adjust with a + or - accordingly
    t = datetime.strptime(time, '%I:%M:%S %p') + timedelta(hours=timezone_swap)
    # Format the datetime object into a 24-hour time string
    format = '%H:%M:%S'
    t = t.strftime(format)
    #  Convert back to datetime object
    t = datetime.strptime(t, '%H:%M:%S').time()
    return(t)

print(time_convert(sunrise, offset_hours))

05:30:06


By converting our data into a consistent format we make plotting later on much easier and it makes comparing different time values consistent as **24** will always be the maximum while **0** is always the minimum.

## Explore

Now that our data has been _transformed_, we can start finding ways to explore going beyond just finding `sunset` and `sunrise` times for a particular date. What if we wanted to look at the differences between `sunrise` and `sunset` in a scope of a couple of days, a month, or even an year? Let's try finding a way to set a particular range of dates in order to compare these values! 

(Note): When inputing ranges of dates that are over an year, the notebook will take a long time to load.

In [9]:
total_info = {"date": []}

def daterange(start_date, end_date):
    for n in range(int((end_date - start_date).days)):
        yield start_date + timedelta(n)

start_date = date(2023, 4, 19)
end_date = date(2023, 5, 19)

for single_date in daterange(start_date, end_date):
    total_info["date"].append(single_date.strftime("%Y-%m-%d"))
data = pd.DataFrame(total_info)

display(data)

Unnamed: 0,date
0,2023-04-19
1,2023-04-20
2,2023-04-21
3,2023-04-22
4,2023-04-23
5,2023-04-24
6,2023-04-25
7,2023-04-26
8,2023-04-27
9,2023-04-28


Looking above, we see that each date is formated as (YYYY-MM-DD), which is ideal for plotting as each date is in _sequential_ order. 

In the cell below, alter the corresponding `longitude`, `latitude` values alongside the particular number of hours needed to offset from the **UTC timezone**.

In [10]:
# Alter these longitude and latitude values below

longitude = -114.06666 # change this longitude value
latitude = 51.049999   # change this latitude value

# Amount of hours to offset changed here - currently set to MST (Calgary's timezone)
offset_hours = -6

Putting it all together, we can now find `sunrise` and `sunset` times from a certain period of dates obtaining times in a _24-hour_ format.

In [11]:
# params format: latitude, longitude, date (YYYY-MM-DD)
sunset_info = {"sunset": []}
sunrise_info = {"sunrise": []}
day_length = {"day_length": []}

def sunrisesunset(lat, lng, date):
    params = {"lat":lat, "lng":lng, "date":date}
    data = requests.get("https://api.sunrise-sunset.org/json", params=params)
    data = json.loads(data.text)
    data = data["results"]
    return data["sunrise"], data["sunset"], data["day_length"]


for i in range(len(data)):
    temp, temp1, temp2 = sunrisesunset(latitude, longitude, total_info['date'][i])
    sunrise_info['sunrise'].append(time_convert(temp, offset_hours))
    sunset_info['sunset'].append(time_convert(temp1, offset_hours))
    day_length['day_length'].append(temp2)

sunset_info = pd.DataFrame(sunset_info)
sunrise_info = pd.DataFrame(sunrise_info)
day_length = pd.DataFrame(day_length)

Let's display everything in a **dateframe**, appending the `date`, `sunrise`, `sunset`, and `day_length` dataframes.

In [12]:
temp_total_info = data.join(sunrise_info['sunrise'])
temp_total_info2 = temp_total_info.join(sunset_info["sunset"])
df = temp_total_info2.join(day_length['day_length'])
display(df)

Unnamed: 0,date,sunrise,sunset,day_length
0,2023-04-19,06:30:46,20:39:58,14:09:12
1,2023-04-20,06:28:43,20:41:37,14:12:54
2,2023-04-21,06:26:40,20:43:15,14:16:35
3,2023-04-22,06:24:38,20:44:53,14:20:15
4,2023-04-23,06:22:37,20:46:32,14:23:55
5,2023-04-24,06:20:37,20:48:10,14:27:33
6,2023-04-25,06:18:38,20:49:48,14:31:10
7,2023-04-26,06:16:40,20:51:26,14:34:46
8,2023-04-27,06:14:44,20:53:04,14:38:20
9,2023-04-28,06:12:48,20:54:41,14:41:53


From the information we gathered, we can now see _scatterplots_ for `sunrise`, `sunset`, and `day_length` values for our chosen timezone and day lengths.

In [13]:

sunset_fig = px.scatter(df, x='date', y='sunset', title=f'Sunset Times for {offset_hours} timezone', color='sunset').update_layout(showlegend=False)
if (df['sunrise'][0] > df['sunrise'][1]):
    sunset_fig = sunset_fig.update_layout(yaxis=dict(autorange='reversed'))
sunset_fig = sunset_fig.update_traces(marker_color='red').show()

sunrise_fig = px.scatter(df, x='date', y='sunrise', title=f'Sunrise Times for {offset_hours} timezone', color='sunrise').update_layout(showlegend=False)
if (df['sunset'][0] > df['sunset'][1]):
    sunrise_fig = sunrise_fig.update_layout(yaxis=dict(autorange='reversed'))
sunrise_fig = sunrise_fig.update_traces(marker_color='lightblue').show()

total_length = px.scatter(df, x='date', y='day_length', title=f'Day Length Times for {offset_hours} timezone', color='day_length').update_layout(showlegend=False)
if (df['day_length'][0] > df['day_length'][1]):
    total_length = total_length.update_layout(yaxis=dict(autorange='reversed'))
total_length = total_length.update_traces(marker_color='yellow').show()

## Interpret

In the scatter plot you can see the dates on the x-axis and the corresponding `sunrise`, `sunset`, and `day_lengths` on the y-axis. 

The color of the dots helps to differentiate the different plots with `sunrise` being red, `sunset` being light-blue, and `day_length` being yellow.

We notice that when the range of dates is set to approximately a month, `sunrise` and `sunset` times follow a _linear_ relationship, where the dates consistently either go up or down a particular value making the plot look like a straight line. Why do you think these values follow a certain _trend_?

## Reflect on what you see

After making your visualization the next step is to use the data and your visualization to answer the question. Look at and interact with the visualization above. When you hover your mouse over the plots, you’ll notice more information appears. You can also use the legend to make plots appear and disappear.

#### Think about the following questions.

* What do you notice about these graphs?
* What do you wonder about the data?
* What kind of inferences can you make based on this data?
* Is there another way to visualize this data that would change your inerpretation of the information? 


#### Use the fill-in-the-blank prompts to summarize your thoughts.
* "I used to think _______"
* "Now I think _______"
* "I wish I knew more about _______"
* "These data visualizations remind me of _______"
* "I really like _______"

## Communicate

If you have not yet done this, use the plot to answer our question on which timezones have the largest differences between sunrise and sunset times. Vice-versa, which areas have the smallest differences?

How can you communicate that information? What kind of product could you create to share that information with your school community and wider community?

Consider tagging Callysto on [Twitter](https://twitter.com/callysto_canada), [YouTube](https://www.youtube.com/Callysto), [TikTok](https://www.tiktok.com/@callysto_canada), [Facebook](https://www.facebook.com/callystocanada/), or [Linkedin](https://www.linkedin.com/company/callysto-canada/) if you decide to share your reflections or projects on social media.

## Further Resources

Other sources for sunrise and sunset times, such as the sunset and sunrise times of Calgary, can be found [here](https://www.timeanddate.com/sun/canada/calgary)

[![Callysto.ca License](https://github.com/callysto/curriculum-notebooks/blob/master/callysto-notebook-banner-bottom.jpg?raw=true)](https://github.com/callysto/curriculum-notebooks/blob/master/LICENSE.md)