In [148]:
import pandas as pd
import altair as alt
alt.data_transformers.enable('vegafusion')
import numpy as np

### Visualization Project Part 1: Finding your Data
---
Locate a dataset that you are interested in working with. The data should be sufficiently complex that you can ask lots of questions about it and engage in creative design techniques, but not so complex that you need specialized hardware or algorithmic approaches to analyze. While you are welcome to use any data you’d like, I recommend that your datasets are tabular (e.g., CSV, TSV, SQL, etc.), contain 5,000 or fewer datapoints (on the order of one hundred or so tends to be sufficiently interesting without causing lag in Altair), and is data that you’re comfortable discussing as part of the course (e.g., avoid data that is overly private or classified). 

Discuss your dataset, including the data’s source, key attributes/dimensions of the data, and your goals for working with that data (i.e., what are the key questions you want to answer). Identify existing relevant visualizations for working with that data (either using the same data, showing the same concepts, or just that might provide some inspiration) and critique those visualizations based on the practices from this module. What works well? What might need improvement or to change to answer your target questions? 

In [149]:
sat_df = pd.read_csv('UCS-Satellite-Database-1-1-2023.csv')
country_codes_df = pd.read_csv('iso_3166_country_codes.csv')

In [150]:
print(sat_df.shape)
print(sat_df.columns)

(6718, 68)
Index(['Name of Satellite, Alternate Names',
       'Current Official Name of Satellite', 'Country/Org of UN Registry',
       'Country of Operator/Owner', 'Operator/Owner', 'Users', 'Purpose',
       'Detailed Purpose', 'Class of Orbit', 'Type of Orbit',
       'Longitude of GEO (degrees)', 'Perigee (km)', 'Apogee (km)',
       'Eccentricity', 'Inclination (degrees)', 'Period (minutes)',
       'Launch Mass (kg.)', ' Dry Mass (kg.) ', 'Power (watts)',
       'Date of Launch', 'Expected Lifetime (yrs.)', 'Contractor',
       'Country of Contractor', 'Launch Site', 'Launch Vehicle',
       'COSPAR Number', 'NORAD Number', 'Comments', 'Unnamed: 28',
       'Source Used for Orbital Data', 'Source', 'Source.1', 'Source.2',
       'Source.3', 'Source.4', 'Source.5', 'Source.6', 'Unnamed: 37',
       'Unnamed: 38', 'Unnamed: 39', 'Unnamed: 40', 'Unnamed: 41',
       'Unnamed: 42', 'Unnamed: 43', 'Unnamed: 44', 'Unnamed: 45',
       'Unnamed: 46', 'Unnamed: 47', 'Unnamed: 48', 'Unn

In [151]:
print(sat_df['Country of Operator/Owner'].unique())
#sat_df['Country/Org of UN Registry'].unique()
print(sat_df.groupby(['Country of Operator/Owner', 'Operator/Owner']).size().sort_values(ascending = False).head(20)) #.size().sort_values(ascending = False).head(20)

['USA' 'Finland' 'Denmark' 'Multinational' 'Israel' 'ESA' 'Lithuania'
 'Norway' 'Spain' 'United Arab Emirates' 'Algeria' 'Japan' 'Brazil'
 'Kazakhstan' 'Russia' 'South Korea' 'Angola' 'Canada' 'Argentina'
 'USA/Argentina' 'China' 'Belgium' 'Turkey' 'Luxembourg' 'Switzerland'
 'India' 'France/Italy' 'Singapore' 'Azerbaijan' 'Bangladesh'
 'Czech Republic' 'Germany' 'China ' 'Belarus' 'Netherlands' 'Indonesia'
 'France' 'Australia' 'Bulgaria' 'China/Brazil' 'United Kingdom'
 'China/France' 'Tunisia' 'Taiwan' 'Italy' 'Mexico' 'Ecuador' 'Egypt'
 'USA/Canada/Japan' 'USA/Japan/Brazil' 'Ethiopia' 'Colombia' 'USA/Japan'
 'USA/Germany' 'France/Italy/Belgium/Spain/Greece' 'Greece'
 'Greece/United Kingdom' 'ESA/' 'United Kingdom/ESA' 'Malaysia'
 'USA/India/Singapore/Taiwan' 'ESA/Russia' 'Thailand' 'USA/France'
 'Japan/Singapore' 'Jordan' 'Iran' 'Saudi Arabia'
 'United Kingdom/Netherlands' 'Laos' 'Morocco/Germany' 'South Africa'
 'India/France' 'Vietnam' 'Morocco' 'USA/Canada' 'Slovenia' 'Sinapore'

In [152]:
#sat_df.columns[sat_df.isna().any()].tolist()
sat_df = sat_df.drop(sat_df.iloc[:, 28:], axis = 1)
for col in sat_df:
    print(col, '|', sat_df[col].unique().size)
    # if sat_df[col].unique().size <= 3:
    #     sat_df = sat_df.drop(col, axis = 1)

Name of Satellite, Alternate Names | 6709
Current Official Name of Satellite | 6698
Country/Org of UN Registry | 70
Country of Operator/Owner | 104
Operator/Owner | 639
Users | 20
Purpose | 31
Detailed Purpose | 53
Class of Orbit | 5
Type of Orbit | 9
Longitude of GEO (degrees) | 446
Perigee (km) | 783
Apogee (km) | 777
Eccentricity | 796
Inclination (degrees) | 450
Period (minutes) | 580
Launch Mass (kg.) | 567
 Dry Mass (kg.)  | 172
Power (watts) | 153
Date of Launch | 1187
Expected Lifetime (yrs.) | 29
Contractor | 560
Country of Contractor | 103
Launch Site | 39
Launch Vehicle | 164
COSPAR Number | 6707
NORAD Number | 6703
Comments | 1288


In [153]:
country_codes_df = country_codes_df.rename({'name' : 'Country of Operator/Owner'}, axis = 1)

# Correcting mispellings in both dataframes from the data sources.
country_codes_df.loc[country_codes_df['Country of Operator/Owner'] == 'United States of America', 'Country of Operator/Owner'] = 'USA'
country_codes_df.loc[country_codes_df['Country of Operator/Owner'] == 'United Kingdom of Great Britain and Northern Ireland', 'Country of Operator/Owner'] = 'United Kingdom'
country_codes_df.loc[country_codes_df['Country of Operator/Owner'] == 'Russian Federation', 'Country of Operator/Owner'] = 'Russia'
country_codes_df.loc[country_codes_df['Country of Operator/Owner'] == 'Korea, Republic of', 'Country of Operator/Owner'] = 'South Korea'
country_codes_df.loc[country_codes_df['Country of Operator/Owner'] == 'Taiwan, Province of China', 'Country of Operator/Owner'] = 'Taiwan'
country_codes_df.loc[country_codes_df['Country of Operator/Owner'] == 'Iran (Islamic Republic of)', 'Country of Operator/Owner'] = 'Iran'
country_codes_df.loc[country_codes_df['Country of Operator/Owner'] == "Lao People's Democratic Republic", 'Country of Operator/Owner'] = 'Laos'
country_codes_df.loc[country_codes_df['Country of Operator/Owner'] == 'Viet Nam', 'Country of Operator/Owner'] = 'Vietnam'
country_codes_df.loc[country_codes_df['Country of Operator/Owner'] == 'Venezuela (Bolivarian Republic of)', 'Country of Operator/Owner'] = 'Venezuela'
country_codes_df.loc[country_codes_df['Country of Operator/Owner'] == 'Bolivia (Plurinational State of)', 'Country of Operator/Owner'] = 'Bolivia'

sat_df.loc[sat_df['Country of Operator/Owner'] == 'ESA/', 'Country of Operator/Owner'] = 'USA' # This is the Hubble Telescope
sat_df.loc[sat_df['Country of Operator/Owner'] == 'Czech Republic', 'Country of Operator/Owner'] = 'Czechia'
sat_df.loc[sat_df['Country of Operator/Owner'] == 'China ', 'Country of Operator/Owner'] = 'China'
sat_df.loc[sat_df['Country of Operator/Owner'] == "Sinapore", 'Country of Operator/Owner'] = 'Singapore'
print(country_codes_df.head())
sat_df = sat_df.merge(country_codes_df, on = 'Country of Operator/Owner', how = 'left')
display(sat_df.head())

  Country of Operator/Owner alpha-3  country-code
0               Afghanistan     AFG             4
1             Åland Islands     ALA           248
2                   Albania     ALB             8
3                   Algeria     DZA            12
4            American Samoa     ASM            16


Unnamed: 0,"Name of Satellite, Alternate Names",Current Official Name of Satellite,Country/Org of UN Registry,Country of Operator/Owner,Operator/Owner,Users,Purpose,Detailed Purpose,Class of Orbit,Type of Orbit,...,Expected Lifetime (yrs.),Contractor,Country of Contractor,Launch Site,Launch Vehicle,COSPAR Number,NORAD Number,Comments,alpha-3,country-code
0,1HOPSAT-TD (1st-generation High Optical Perfor...,1HOPSAT-TD,NR,USA,Hera Systems,Commercial,Earth Observation,Infrared Imaging,LEO,Non-Polar Inclined,...,0.5,Hera Systems,USA,Satish Dhawan Space Centre,PSLV,2019-089H,44859,Pathfinder for planned earth observation const...,USA,840.0
1,Aalto-1,Aalto-1,Finland,Finland,Aalto University,Civil,Technology Development,,LEO,Sun-Synchronous,...,2.0,Aalto University,Finland,Satish Dhawan Space Centre,PSLV,2017-036L,42775,Technology development and education.,FIN,246.0
2,AAt-4,AAt-4,Denmark,Denmark,University of Aalborg,Civil,Earth Observation,Automatic Identification System (AIS),LEO,Sun-Synchronous,...,,University of Aalborg,Denmark,Guiana Space Center,Soyuz-2.1a,2016-025E,41460,Carries AIS system.,DNK,208.0
3,"ABS-2 (Koreasat-8, ST-3)",ABS-2,NR,Multinational,Asia Broadcast Satellite Ltd.,Commercial,Communications,,GEO,,...,15.0,Space Systems/Loral,USA,Guiana Space Center,Ariane 5 ECA,2014-006A,39508,"32 C-band, 51 Ku-band, and 6 Ka-band transpond...",,
4,ABS-2A,ABS-2A,NR,Multinational,Asia Broadcast Satellite Ltd.,Commercial,Communications,,GEO,,...,15.0,Boeing Satellite Systems,USA,Cape Canaveral,Falcon 9,2016-038A,41588,,,


In [169]:
print(sat_df['Country of Operator/Owner'].where(sat_df['country-code'].isna()).unique())

[nan 'Multinational' 'ESA' 'USA/Argentina' 'France/Italy' 'China/Brazil'
 'China/France' 'USA/Canada/Japan' 'USA/Japan/Brazil' 'USA/Japan'
 'USA/Germany' 'France/Italy/Belgium/Spain/Greece' 'Greece/United Kingdom'
 'United Kingdom/ESA' 'USA/India/Singapore/Taiwan' 'ESA/Russia'
 'USA/France' 'Japan/Singapore' 'United Kingdom/Netherlands'
 'Morocco/Germany' 'India/France' 'USA/Canada' 'India/Canada'
 'France/Belgium/Sweden' 'Singapore/Taiwan' 'Poland/UK'
 'USA/United Kingdom/Italy' 'Turkmenistan/Monaco' 'France/Israel'
 'China/Italy']


In [174]:
sat_df.loc[sat_df['country-code'].isna(), 'Country of Operator/Owner'].size

168

In [175]:
display(country_codes_df.loc[country_codes_df['Country of Operator/Owner'] == None])
display(sat_df.loc[sat_df['Country of Operator/Owner'] == 'Multinational'])

Unnamed: 0,Country of Operator/Owner,alpha-3,country-code


Unnamed: 0,"Name of Satellite, Alternate Names",Current Official Name of Satellite,Country/Org of UN Registry,Country of Operator/Owner,Operator/Owner,Users,Purpose,Detailed Purpose,Class of Orbit,Type of Orbit,...,Expected Lifetime (yrs.),Contractor,Country of Contractor,Launch Site,Launch Vehicle,COSPAR Number,NORAD Number,Comments,alpha-3,country-code
3,"ABS-2 (Koreasat-8, ST-3)",ABS-2,NR,Multinational,Asia Broadcast Satellite Ltd.,Commercial,Communications,,GEO,,...,15.0,Space Systems/Loral,USA,Guiana Space Center,Ariane 5 ECA,2014-006A,39508,"32 C-band, 51 Ku-band, and 6 Ka-band transpond...",,
4,ABS-2A,ABS-2A,NR,Multinational,Asia Broadcast Satellite Ltd.,Commercial,Communications,,GEO,,...,15.0,Boeing Satellite Systems,USA,Cape Canaveral,Falcon 9,2016-038A,41588,,,
5,ABS-3A,ABS-3A,NR,Multinational,Asia Broadcast Satellite Ltd.,Commercial,Communications,,GEO,,...,15.0,Boeing Satellite Systems,USA,Cape Canaveral,Falcon 9,2015-010A,40424,Coverage of Americas Europe and Africa.,,
6,"ABS-4 (ABS-2i, MBSat, Mobile Broadcasting Sate...",ABS-4,NR,Multinational,Asia Broadcast Satellite Ltd.,Commercial,Communications,,GEO,,...,12.0,Space Systems/Loral,USA,Cape Canaveral,Atlas 3,2004-007A,28184,Purchased by ABS in 2013.,,
7,"ABS-6 (ABS-1, LMI-1, Lockheed Martin-Intersput...",ABS-6,NR,Multinational,Asia Broadcast Satellite Ltd.,Commercial,Communications,,GEO,,...,15.0,Lockheed Martin,USA,Baikonur Cosmodrome,Proton,1999-053A,25924,"28 C-band, 16 Ku-band; business services, publ...",,
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
2469,Rascom-QAF 1R,Rascom-QAF 1R,NR,Multinational,Regional African Satellite Communications Orga...,Commercial,Communications,,GEO,,...,15.0,Thales Alenia Space,France,Guiana Space Center,Ariane 5,2010-037B,36831,Low cost communications for sub-Saharan Africa...,,
2828,Spektr-R/RadioAstron,Spektr-R/RadioAstron,Russia,Multinational,Astro Space Center of Moscow/Russian Academy o...,Government,Space Science,,Elliptical,Cislunar,...,5.0,Lavochkin,Russia,Baikonur Cosmodrome,Zenit 3SLBF/Fregat SB,2011-037A,37755,"Featuring a 30-foot (10-meter) wide antenna, ""...",,
6333,THEMIS A (Time History of Events and Macroscal...,THEMIS A,USA,Multinational,National Aeronautics and Space Administration ...,Government/Civil,Space Science,,Elliptical,,...,2.0,Swales Aerospace,USA,Cape Canaveral,Delta 2,2007-004A,30580,Five phase mission to study earth's auroras; m...,,
6334,THEMIS D (Time History of Events and Macroscal...,THEMIS D,USA,Multinational,National Aeronautics and Space Administration ...,Government/Civil,Space Science,,Elliptical,,...,2.0,Swales Aerospace,USA,Cape Canaveral,Delta 2,2007-004D,30797,Five phase mission to study earth's auroras; m...,,


In [156]:
print(sat_df.groupby(['Country of Operator/Owner', 'alpha-3', 'country-code']).size().sort_values(ascending = False).head(20))

Country of Operator/Owner  alpha-3  country-code
USA                        USA      840.0           4512
China                      CHN      156.0            587
United Kingdom             GBR      826.0            561
Russia                     RUS      643.0            177
Japan                      JPN      392.0             88
India                      IND      356.0             59
Canada                     CAN      124.0             56
Germany                    DEU      276.0             48
Luxembourg                 LUX      442.0             45
Argentina                  ARG      32.0              38
Israel                     ISR      376.0             27
Spain                      ESP      724.0             26
France                     FRA      250.0             24
Finland                    FIN      246.0             23
South Korea                KOR      410.0             21
Italy                      ITA      380.0             15
Switzerland                CHE      756

### Visualization Project Part 2: Sketching your Data
---
Your Module 1 discussion post identified some high-level goals for working with a dataset of interest to you. In this post, you will expand on those goals to characterize your target problem and develop some low-fidelity prototypes for working with that data. First, identify two to three tasks you would wish to complete with your data, identifying: 

1. Why is a task pursued? (goal)

2. How is a task conducted? (means)

3. What does a task seek to learn about the data? (characteristics)

4. Where does the task operate? (target data)

5. When is the task performed? (workflow)

6. Who is executing the task? (roles)

7. Then, sketch a set of preliminary low-fidelity prototypes for addressing these tasks with the given data. You may either sketch freeform or use the Five Design Sheets approach to generate these prototypes (hand-sketched on paper is fine). Upload a copy of your sketches as part of your post. 

### Visualization Project Part 3: A Plan for Evaluation
---
In your previous post, you identified a series of tasks and goals for your visualization as well as some preliminary design ideas. We’ll jump ahead a few steps and start to think about how we might evaluate our design approach. Outline a preliminary evaluation that addresses your core goals with the visualization. Make sure your evaluation discusses: 

The target question you want to answer

The people you would recruit to answer that question

The kinds of measures you would use to answer your data (e.g., insight depth, use cases, accuracy) and what these measures would tell you about the core question

The approach you will use to answer that question (e.g., a journaling study, a formal experiment, etc.)

How you would instantiate those methods (i.e., what would your participants do?)

What criteria would you use to indicate that your visualization was successful

### About the Final Project
---
Throughout the Modules, you have found a dataset, characterized the corresponding goals and tasks you want to conduct with that data, designed preliminary approaches, and outlined how you would evaluate those approaches. For your final project, you will put these ideas into practice by executing on the project plan outlined in your prior posts.

For this project, you will implement a visualization using your data from Module 1 and preliminary low-fidelity prototypes from Module 2 to address your stated goals. You may implement this visualization using either Altair or another platform of your choice. Once implemented, conduct your evaluation based on the plan outlined in your Module 3 discussion post, making sure to conduct your evaluation with at least three people. You may refine any of your prior plan to reflect your evolving understanding of the challenges you are addressing. Be sure to address how your plan has changed from these earlier posts as part of your discussion. 

Your final project post should include: 

A brief recap of your data, goals, and tasks, focusing on those that most directly influence your design

Screenshots of and/or a link to your visualization implementation (see below for additional guidance)

A summary of the key elements of your design and accompanying justification

A discussion of your final evaluation approach, including the procedure, people recruited, and results. Note that, due to the difficulty of recruiting experts, you can use colleagues, friends, classmates, or family to evaluate your designs if experts or others from your target population are unavailable. 

A synthesis of your findings, including what elements of your approach worked well and what elements you would refine in future iterations.

Guidance and platforms for deploying Altair visualizations online include: 

Altair: Interactive Plots on the Web

Add Animated Charts To Your Dashboards With Streamlit-Python

Creating Interactive Jupyter Notebooks and Deployment on Heroku Using Voila

