# SAO/LIP Python Primer Course Exercise Set 7

[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/acorreia61201/SAOPythonPrimer/blob/main/exercises/Exercises7.ipynb)

This exercise set is split between two parts. The first part deals with basic Python I/O, while the second does some more robust numerical analyses using `pandas`.

## Exercise 1.1: Simple File Manipulation

Let's get some practice reading and writing files using Python builtins, as well as navigating directories. Each file that you generate below should have one word per line.

**Your task:** Create a directory called `my_files`. Move to this directory and create two new directories inside it, `dogs` and `fruits`. Check by using `os.listdir()`.

In [6]:
# YOUR CODE HERE

**Your task:** Generate two files:
- Create a file `dog_list.txt` using the list `dog_breeds` below, and place it in `dogs`.
- Create a file `fruit_list.txt` using the list `fruit_types` below, and place it in `fruits`.

In [7]:
dog_breeds = ['pug', 'schnauzer', 'shiba inu', 'poodle', 'greyhound', 'chihuahua', 'great dane']
fruit_types = ['apple', 'banana', 'blueberry', 'strawberry', 'grape', 'pineapple', 'cantaloupe']

# YOUR CODE HERE

**Your task:** Create a new file `reversed_dogs.txt`. Read in the contents of `dog_list.txt` and write them in reverse order in this new file. Move this file to the `dogs` directory. (Don't just use the lists above; you should use `open()` to get the contents.)

In [8]:
# YOUR CODE HERE

**Your task:** Create a new file `alphabet_fruits.txt`. Read in the contents of `fruit_list.txt` and write them in alphabetical order in this new file. Move this file to the `fruits` directory.

In [None]:
# YOUR CODE HERE

**Your task:** Write a loop below that reads in `dogs/dogs_list.txt` and prints out the names of dogs that contain the letter `a`. Write these names to a new file, `dog_count.txt`, in the current working directory (i.e. `my_files`).

In [None]:
# YOUR CODE HERE

**Your task:** Write a loop that reads in `fruits/fruit_list.txt` and prints out the names of fruits with more than 7 letters. Write these names to a new file, `fruit_count.txt`, in the current working directory (i.e. `my_files`).

In [None]:
# YOUR CODE HERE

## Exercise 1.2: Find and Replace Substrings in Files

In this exercise, we'll be using the Gettysburg Address file that was downloaded in the lecture. If you need to download or re-download the file, run the cell below (be sure to rename it if necessary).

In [None]:
!wget https://collincapano.com/wp-content/uploads/2023/01/gettysburg_address-bliss_copy.txt

**Your task:** Write a loop below that searches the file for the string 'here' (don't worry about caps; none of the occurrences of 'here' are capitalized anyways). If a line contains 'here', print that line to screen along with its line number. Treat the first line as line 1. At the end, print out the number of occurrences of 'here' in the file.

In [1]:
# YOUR CODE HERE

**Your task:** Now, write a loop that replaces every occurrence of 'here' with 'there'. You will need to make use of the method `readobj.replace()`, where `readobj` is a placeholder for `file.read()` for a given `file`. The first input for `replace()` is the string you wish to replace, and the second is the string you want to replace it with. (If you need help, see the examples at https://linuxhint.com/python-replaces-string-file/.) Write the changes to the same file, and use `read()` to check if the changes were correct.

In [3]:
# YOUR CODE HERE

Let's now try the same thing with multiple files. Run the cell below to download a directory containing a bunch of files I made. You'll see a bunch of files that end in `.ini` if you do `os.listdir()`.

In [1]:
!wget https://github.com/acorreia61201/SAOPythonPrimer/tree/a0653c521ebab4fbbd6104def791cbda3978d5c7/ini_files

--2023-05-31 14:25:59--  https://github.com/acorreia61201/SAOPythonPrimer/tree/a0653c521ebab4fbbd6104def791cbda3978d5c7/ini_files
Resolving github.com (github.com)... 140.82.112.4
Connecting to github.com (github.com)|140.82.112.4|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: unspecified [text/html]
Saving to: ‘ini_files’

ini_files               [ <=>                ] 132.09K  --.-KB/s    in 0.08s   

2023-05-31 14:25:59 (1.57 MB/s) - ‘ini_files’ saved [135264]



**Your task:** Each file contains lines with the path `'/work/pi_ccapano_umassd_edu/acorreia7_umassd_edu'`, my home directory on a cluster I use to do research. Use a loop to find and replace each instance of this string in the files you've downloaded with your current working directory. 

A good check would be to first print out how many instances of my string are in the files. Then, once you do your find/replace procedure, you should find that the files have the exact same number of your string and none of mine.

In [5]:
# YOUR CODE HERE

## Exercise 2.1: Practice with Loading and Plotting Data

We'll do a couple simple examples involving plotting various datasets.

Download the following data:

In [4]:
!wget https://www.gw-openscience.org/GW150914data/P150914/fig1-observed-H.txt

--2023-05-31 22:54:48--  https://www.gw-openscience.org/GW150914data/P150914/fig1-observed-H.txt
Resolving www.gw-openscience.org (www.gw-openscience.org)... 131.215.113.73
Connecting to www.gw-openscience.org (www.gw-openscience.org)|131.215.113.73|:443... connected.
HTTP request sent, awaiting response... 302 Found
Location: https://gwosc.org/GW150914data/P150914/fig1-observed-H.txt [following]
--2023-05-31 22:54:48--  https://gwosc.org/GW150914data/P150914/fig1-observed-H.txt
Resolving gwosc.org (gwosc.org)... 131.215.113.73
Connecting to gwosc.org (gwosc.org)|131.215.113.73|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 173833 (170K) [text/plain]
Saving to: ‘fig1-observed-H.txt’


2023-05-31 22:54:49 (720 KB/s) - ‘fig1-observed-H.txt’ saved [173833/173833]



This is the initial data release for GW150914, the first direct detection of a gravitational wave. The first column lists the times in seconds of the analysis, and the second column lists the *strain* in units of $10^{-21}$, a technical term for the wave amplitude. GW analysis is a fairly complicated topic, so for now we're just going to plot the data.

**Your task:** Load the data from above using `pandas`. We need to make use of some keywords to make it work:
- This isn't strictly a comma-separated list, so we'll have to use the `delim_whitespace = True` keyword. 
- We'll also have to use the `skiprows` and `header = None` keywords to skip the first row of the dataset. 
- Since there are no labels, we need to set our own labels for the `Series` objects. We can do this using `names=[col1, col2]`, where col1 and col2 are the strings representing your column titles.

Check that you read in the data correctly. The data should run from 0.25 to ~0.46 seconds.

In [None]:
# YOUR CODE HERE

**Your task:** Plot the strain data versus time 

## Exercise 2.2: Analyzing Solar System Orbits

Download the following dataset:

In [None]:
!wget 

**Your task:** Use `pandas` to load in and view the data in this csv file.

In [None]:
# YOUR CODE HERE

This dataset contains some orbital parameters for the planets in the solar system (and Pluto). The distance column gives the average distance to the Sun in kilometers, period_rot gives the rotational period in Earth days, and period_orb gives the orbital period in Earth years. The last column gives the *eccentricity* of each planet, a parameter that describes how elliptical the planets' orbits are. For bound orbits, this value ranges between 0 and 1, with 0 representing a perfect circle, and increasing values representing a more "stretched-out" orbit.

As you may expect, the distance measurements are very large, into the billions for the farthest planets. In solar system astronomy, it's convenient to use *astronomical units*, or *AU*, to measure distances. We can use the conversion $1 AU = 149597870.7 km$ to convert the values in our dataset.

**Your task:** Modify the `DataFrame` above so that the distance column is in units of AU. As a sanity check, 1 AU is defined as the average distance from the Earth to the Sun, so the Earth's distance should be something close to 1.

In [None]:
# YOUR CODE HERE

Using this data, we can do a simple verification of *Kepler's third law*. In its simplest form, it reads:

\begin{equation}
P \propto a^{3/2}
\end{equation}

This means that the period $P$ of a planet's orbit is proportional to the *semimajor axis* $a$ raised to the power of $3/2$. We define the semimajor axis as the average between the maximum and minimum distances of a planet from the Sun.

**Your task:** Generate a scatterplot of $P$ versus $a^{3/2}$ using your dataset. Plot this on a loglog scale and label your axes 'Period (Earth yrs)' and 'Semimajor axis (AU)'.

On a loglog scale, the above proportionality will produce a straight line. Does your plot match this?

In [None]:
# YOUR CODE HERE

The *mean orbital velocity* is the average speed at which an object orbits the Sun. There are two ways we can calculate this. By Newton's law of universal gravitation and assuming the mass of the planet is much less than that of the Sun, we can calculate velocity on an object with average distance $a$ as:

\begin{equation}
v = \sqrt{\frac{GM}{a}}
\end{equation}

$G = 6.6743 \times 10^{-11} m^3kg^{-1}s^{-2}$ is *Newton's gravitational constant*, and $M = 2 \times 10^{30} kg$ is the mass of the Sun.

We can also approximate the velocity using $a$, $P$, and eccentricity $e$ with the following (this comes from evaluating an *elliptic integral*, an advanced calculus concept we won't get into):

\begin{equation}
v \approx \frac{2\pi a}{P}\bigg[ 1 - \frac{e^2}{4} \bigg]
\end{equation}

**Your task:** For each planet, calculate its approximate orbital velocity using the second equation. Convert your velocities to $m/s$, i.e. convert $a$ to meters ($1 AU = 149597870691 m, 1 km = 1000 m$) and $P$ to seconds ($1 year = 3.154 \times 10^7 s$). You may reload the dataset if you'd prefer to convert from kilometers rather than AUs.

In [2]:
# YOUR CODE HERE

**Your task:** Plot your points above as a scatterplot of velocity versus $a$. On the same axes, plot the first equation along a range of 500 points on $a = [10^7, 10^{10}] m$. Label the axes accordingly. How well do the data points match the Newtonian model?

In [3]:
# YOUR CODE HERE