# Telling Stories With Data


**Version 0.1**

***
By Kalina Borkiewicz
8 Jan 2021

Find a dataset of your choice. This can be one you have visualized before, one you found online, or the sample data provided. Information is provided below for the sample data, but **you are encouraged to use your own data**. You are not required to complete the assignment in this Jupyter notebook, but can use any tool you are comfortable with.

Use the lessons from the lecture to complete the assignment below.

## Sample Dataset (skip if using your own data)

Download the [`hipgalv.LSR.csv`](https://northwestern.box.com/s/ykmt0tf4hdjbx61o6f83kv5qsyex91yb) file which includes the Hipparcos star catalog, processed and described below by Stuart Levy. The star positions/velocities are given in Galactic coordinates relative to the Sun. 

What the fields mean:

* x, y, z  -- star position, in parsecs.  Sun = 0,0,0.   z is approximately the altitude above/below the Galactic plane
* colorb_v and colorv_i -- two measures of color, "B-V" (blue vs yellow) and "V-I" (yellow vs near-infrared). More positive => more red. A white star has values near zero, a yellow sunlike star around 0.5, a very red star 1.5 or more.
* Mv -- estimated intrinsic luminosity in magnitude (logarithmic) units.   (Note that Mv is different from lower-case mv.)
* vx, vy, vz -- velocity.  vz is the vertical (above/below Galactic plane) velocity. 
* speed -- magnitude of |vx, vy, vz| vector

Things that might be neat to study:
* Is the altitude above/below the plane related to the star's color?
* What's the distribution of color vs Mv intrinsic luminosity? 
* A fairly-dim star might have Mv of +3 or +5 or so (the sun is about +5); a fairly bright star might have Mv around 0; a superluminous one, Mv ~ -5.
* Do the B-V and V-I colors change their distribution for nearby stars vs more-distant ones?  (more positive => more red) (They might adopt "nearby" as closer than 200 parsecs, say.)
* Is the distribution of velocities in the xy plane different for nearby stars than for more-distant stars?
* Are there noticeable clumps in the distribution of velocities in the xy plane?   (Can you detect star clusters from their space velocities?)
* Stars' orbits send them oscillating above and below the Galactic plane.    We're catching them at arbitrary places in their orbits, but can use a combination of z and vz to estimate how high/low they'll go.   This isn't an accurate formula, but each star's peak altitude might be something like 1.4*(vz^2) + abs(z), so that the Sun (with vz of about 7 km/sec) will have a peak altitude of about 80 parsecs.
* Given that, is there a relation between star color and that derived peak altitude?   (I'd expect that there'll be lower peak altitudes for bluer stars, higher ones for redder stars.)

In [1]:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
from matplotlib import cm

%matplotlib notebook

Here is a code sample showing how to read the data and draw a colored plot.

In [5]:
# Load the csv with pandas
df = pd.read_csv('hipgalv.LSR.csv', index_col=0)
#print(df)

# Use matplotlib's default "Reds" colormap. More colormaps and information here:
# https://matplotlib.org/3.1.0/tutorials/colors/colormaps.html
cmap = cm.get_cmap('Reds') 

# Make the plot
ax = df.plot.scatter(x='z',
                     y='Mv',
                     c='colorb_v',
                     cmap=cmap,
                     title="Sample Visualization")

# Set the x-axis label
ax.set_xlabel("Altitude")

# Set the y-axis label
ax.set_ylabel("Intrinsic Luminosity")

# Set the colormap label
f = plt.gcf()
cax = f.get_axes()[1]
cax.set_ylabel('B-V Color');

<IPython.core.display.Javascript object>

## Problem 1) Tell different stories to different audiences

Create at least two different visualizations from your dataset to answer the following prompts. Vary the labels/captions/title if you use the same visualization for a different prompt.

Note: You do not need to complete this entire problem in this Jupyter notebook. If you would like to add annotations, make a flow chart, show an overview + detail, or anything more complex than a basic chart, feel free to start your visualization here (or elsewhere), and complete it in PowerPoint, PhotoShop, Paint, GIMP, or whatever tool you are comfortable working with. 

**Prompt #1: Tell a story to your peers in astronomy.**

In [28]:
file_read = open('APER_CALC_MAGS.txt', 'r')
data_lines = file_read.readlines()
ra, dec, mag = [], [], []
for line in data_lines:
    data = line.split(' ')
    if (float(data[4]) != 99.999) & (float(data[4]) >0.):
        ra.append(float(data[2]))
        dec.append(float(data[3]))
        mag.append(float(data[4]))
mag = np.array(mag)
ra = np.array(ra)
dec = np.array(dec)

In [71]:
colors = (27-13)/(28-13)
plt.figure(figsize=(6,8))

sc = plt.scatter(ra, dec, s=15,  
            c=[mag], cmap = 'seismic', alpha=1)
plt.xlabel('RA')
plt.ylabel('DEC')
plt.colorbar(sc)
plt.title('6-pc DASH Survey')
#plt.savefig('scatter_mag.pdf')
plt.show()

<IPython.core.display.Javascript object>

**Prompt #2: Tell a story to a fifth grader.**

In [70]:
plt.figure(figsize=(6,8))

sc = plt.scatter(ra, dec, s=15,  
            c=[mag], cmap = 'Purples', alpha=1)
plt.xlabel('RA')
plt.ylabel('DEC')
plt.colorbar(sc)
plt.title('6-pc DASH Survey')
plt.show()

<IPython.core.display.Javascript object>

**Prompt #3: Tell a story using the ten hundred most common words in the English language** (use this tool: https://xkcd.com/simplewriter/ )

Note: This should only require a change in title/caption/labels, not in the visualization itself.

**Prompt #4: Tell a story to a government policymaker who is considering cutting funding for your field.**

## Problem 2) Miscommunication

Choose one of the following two assignments to complete. If you are feeling ambitious, complete both as an optional **Challenge Problem**.

**Prompt Option A: Tell a false, but believable, story with your data**

In [72]:
plt.figure(figsize=(6,8))

sc = plt.scatter(ra, dec, s=15,  
            c=[ra], cmap = 'Purples', alpha=1)
plt.xlabel('RA')
plt.ylabel('DEC')
plt.colorbar(sc)
plt.title('6-pc DASH Survey')
plt.show()

<IPython.core.display.Javascript object>

**Prompt Option B: Review the visualizations you created in Problem 1. How could they be misinterpreted?** Use the space below to write your answer in "markdown" mode.

## Problem 3) Audience Testing

Show one or two of your visualizations to your friends, peers, family members, or other groups of your choice. Did they understand your story? Did they care? How would you change your visualization(s) based on this feedback? Use the space below to write your answer in "markdown" mode.

## Challenge Problem (Optional)

Using what you learned in Problem 2 and Problem 3, revise your visualization(s).