# Creation of Constellation Connections

This notebook is intended to generate the lines that connect the stars within a singular constellation.

## Importing the necessary packages

In [30]:
import pandas as pd
import numpy as np
import os
from pathlib import Path
import ast

## Setting up the important directories

After running the cell below, two folders should be created. `Data Input` is where input data will be placed. Inside `Data Processed` will be two new folders that will keep the processed information.

In [31]:
working_dir = Path.cwd()

data_input_dir = working_dir / "Data Input"
data_process_dir = working_dir / "Data Processed"

hyg_constellation_dir = data_process_dir / "HYG Constellation"
constellation_connection_dir = data_process_dir / "Constellation Connection"

if not hyg_constellation_dir.exists():
    os.mkdir(hyg_constellation_dir)
if not constellation_connection_dir.exists():
    os.mkdir(constellation_connection_dir)

## Extracting information

The following cells will detail how the information about the stars were processed in this notebook

### constellation.fab
---
Place the `constellation.fab` file in the `Data Input` folder.

The `constellation.fab` file contains information about stars associated with specific constellations. This file was obtained from `eleanorlutz`'s GitHub [repository](https://github.com/eleanorlutz/western_constellations_atlas_of_space). Although the repository attributes the file to the open-source planetarium software *Stellarium*, it is no longer available in their repository. So I opted to use `eleanorlutz`'s copy of the file.

The relevant data in the file pertains to connection pairs, represented as a entry of numerical sequences indicating the stars to be connected. An illustrative example is the sequence:

$$1\;2\;2\;3\;3\;4\;4\;1$$

To interpret the connections, the elements are paired, resulting in four pairs in this example. In each pair, the first element designates the starting star, and the second element designates the ending star. In the provided example, the connected stars will form a closed loop. It is important to note that the series should comprise an even number of elements, as each star must be paired with another.

Upon data processing, it was observed that the **CMa** constellation's entry contains an odd number of elements. Checking the entry revealed that star **34045** was paired with itself, shown in the first two entries in the result of the next cell. To address this issue, I chose to omit the first entry in the sequence.

In [32]:
rows = []
with open(str(data_input_dir / "constellationship.fab"), "r") as fabfile:
    rows = fabfile.readlines()

entry_list = []

for row in rows:
    row = row.replace("\n", "")
    row_split = row.split(" ")

    name = row_split[0]
    star_numbers = row_split[3:]

    if len(star_numbers) % 2 != 0:

        print(f"{name} | {len(star_numbers)}")
        print(star_numbers)
        star_numbers = star_numbers[1:]
        print(star_numbers)

    entry = {"Constellation": name, "Star Numbers": str(star_numbers)}
    entry_list.append(entry)

CMa | 33
['34045', '34045', '33347', '33347', '32349', '32349', '33977', '33977', '34444', '34444', '35037', '35037', '35904', '33579', '33856', '33856', '34444', '33856', '33165', '33165', '31592', '31592', '31416', '31592', '30324', '31592', '32349', '33579', '32759', '30122', '33579', '33347', '33160']
['34045', '33347', '33347', '32349', '32349', '33977', '33977', '34444', '34444', '35037', '35037', '35904', '33579', '33856', '33856', '34444', '33856', '33165', '33165', '31592', '31592', '31416', '31592', '30324', '31592', '32349', '33579', '32759', '30122', '33579', '33347', '33160']


## Creating a dataframe containing all the star pairs
---
In the next cell, a dataframe was created containing the pairs of stars that form the links in a constellation. This is then saved to a `csv` file to be used in the later cells. While not necessary, I opten to save it just in case it will be used for other purposes.

In [33]:
columns = ["Constellation", "Star Numbers"]
star_frame = pd.DataFrame(columns=columns)

for entry in entry_list:

    star_frame = pd.concat([star_frame, pd.DataFrame(
        [entry], columns=columns)], ignore_index=True)

star_frame = star_frame.sort_values(by=["Constellation"])

output_path = data_process_dir / f"Constellation_Star_List.csv"
star_frame.to_csv(str(output_path), index=False)

### Preparing the HYG dataset
---
Place the `hygdata_v3.csv` file in the `Data Input` folder.

I chose to utilize the dataset employed by `eleanorlutz`. This decision was made due to the star numbers referenced in the `constellation.fab` file originating from the HYG database. In the subsequent cell, a dataframe will be generated using the `hygdata_v3.csv` file.

In [34]:
hyg_frame = pd.read_csv(str(data_input_dir / "hygdata_v3.csv"))

The dataframe delineating the stars situated within a specific constellation is still here in the notebook,. However, I have chosen to preserve processed data as stated in the previous cell for potential utilization in alternative contexts in the future. Thus, I shall import the CSV file containing the important star numbers corresponding to a constellation in the subsequent cell.

In [35]:
star_frame = pd.read_csv(str(data_process_dir / "Constellation_Star_List.csv"))

## Extracting the relevant stars
---
Not all stars from the HYG database will be employed, as not all stars are integral to the established connections. Nevertheless, each star is designated to a constellation, following the International Astronomical Union's (IAU) division of the sky into 88 regions, each aligned with a recognized constellation. Thus a star located within a specific region is attributed to the constellation associated with that region.

The selection of stars for use will is taken from the star number information contained in the `Constellation_Star_List.csv` file.

In [36]:
for index, row in star_frame.iterrows():
    constellation_name = row["Constellation"]

    constellation_star_list = ast.literal_eval(row["Star Numbers"])
    constellation_frame = pd.DataFrame()

    entry_list = []

    for star_number in constellation_star_list:
        extract_row = hyg_frame[hyg_frame['hip'] == int(star_number)]

        constellation_frame = pd.concat(
            [constellation_frame] + [extract_row], ignore_index=True)

    output_path = hyg_constellation_dir / f"{constellation_name}.csv"
    constellation_frame.to_csv(str(output_path), index=False)

### Extracting the Right Ascension(RA) and Declination(Dec)
---
On the surface of earth, we can specify a point by its latitude and longitude. The same goes for objects in the celestial sphere, with each of them getting assigned right ascension and declination values corresponding to where they are located in the sky.

With the relevant stars extracted, working on the connections is now possible. The important part is extracting the right ascension and declination values of each star.

But before that, a couple of functions will be defined in the next cell.

### Functions
---
The `norm` function normalizes a vector.

The `xyz_to_radec` converts a point on a unit sphere to right ascension and declination.

The `compute_connect` connects two points on the unit sphere through a geodesic.

In [37]:
def norm(x, y, z):
    magnitude = np.sqrt(x**2 + y**2 + z**2)
    x_new = x/magnitude
    y_new = y/magnitude
    z_new = z/magnitude
    return x_new, y_new, z_new


def xyz_to_radec(x, y, z):
    ra = np.arctan2(y, x)
    dec = np.arcsin(z)
    return ra, dec


def compute_connect(x1, y1, z1, x2, y2, z2):

    interval = 96
    buffer = 2
    half_buffer = int(buffer/2)

    x_diff = np.linspace(x1, x2, interval+buffer+1)
    y_diff = np.linspace(y1, y2, interval+buffer+1)
    z_diff = np.linspace(z1, z2, interval+buffer+1)

    count = 0

    new_list_x = []
    new_list_y = []
    new_list_z = []

    while count < interval+buffer+1:
        x_new, y_new, z_new = norm(x_diff[count], y_diff[count], z_diff[count])

        new_list_x.append(x_new)
        new_list_y.append(y_new)
        new_list_z.append(z_new)

        count += 1

    trimmed_x = new_list_x[half_buffer:-half_buffer]
    trimmed_y = new_list_y[half_buffer:-half_buffer]
    trimmed_z = new_list_z[half_buffer:-half_buffer]

    count = 0

    ra_list = []
    dec_list = []

    while count < len(trimmed_x):
        ra, dec = xyz_to_radec(
            trimmed_x[count], trimmed_y[count], trimmed_z[count])
        ra_list.append(np.rad2deg(ra))
        dec_list.append(np.rad2deg(dec))
        count += 1

    # print(ra_list)
    # print(dec_list)

    return ra_list, dec_list

### Actual extraction
---
The processed CSV files for each constellation will be imported. In the subsequent cell, pairs of stars will be connected. This process involves setting a start star and utilizing the subsequent star as the end point. The right ascension and declination values for both stars will be extracted.

Upon obtaining these values, they are transformed into x, y, z coordinates. Initially, the right ascension (RA) values are multiplied by 15, as the data is encoded in a 24-hour format, while the code operates with a 360-degree format. Once these processes are done for both stars in a pair, a line connection is formed by calculating the displacement from their x y z positions.

The vector connecting the starting star and the ending star is evenly sampled. Another vector is generated from the origin to one of these sampled points and is subsequently normalized. This process ensures the projection of the sampling point onto the surface of the sphere. Performing this operation for all sampled points results in obtaining points along the geodesic connecting the two stars.

After obtaining the points on the surface of the sphere, further processing involves converting them back to right ascension (RA) and declination (Dec). These recalculated values are then stored in their respective CSV files.

In [38]:
%%capture
for constellation_file in hyg_constellation_dir.iterdir():
    constellation_frame = pd.read_csv(constellation_file)

    ra_list = []
    dec_list = []

    for index, row in constellation_frame.iterrows():
        ra_list.append(row["ra"])
        dec_list.append(row["dec"])

    entry_list = []

    twice_pairs = len(ra_list)
    count = 0

    while count < twice_pairs:

        start_RA = float(ra_list[count])
        end_RA = float(ra_list[count+1])

        start_Dec = float(dec_list[count])
        end_Dec = float(dec_list[count+1])

        x1 = np.cos(np.radians(start_Dec)) * np.cos(np.radians(start_RA*15))
        y1 = np.cos(np.radians(start_Dec)) * np.sin(np.radians(start_RA*15))
        z1 = np.sin(np.radians(start_Dec))

        x2 = np.cos(np.radians(end_Dec)) * np.cos(np.radians(end_RA*15))
        y2 = np.cos(np.radians(end_Dec)) * np.sin(np.radians(end_RA*15))
        z2 = np.sin(np.radians(end_Dec))

        new_ra_list, new_dec_list = compute_connect(x1, y1, z1, x2, y2, z2)

        new_entry = {'Connection': count/2, 'RA Points': str(
            new_ra_list), "Dec Points": str(new_dec_list)}

        entry_list.append(new_entry)

        count += 2

    columns = ["Connection", "RA Points", "Dec Points"]

    connection_frame = pd.DataFrame(columns=columns)

    for entry in entry_list:
        connection_frame = pd.concat([connection_frame, pd.DataFrame(
            [entry], columns=columns)], ignore_index=True)

    output_path = constellation_connection_dir / \
        f"{constellation_file.stem}.csv"
    connection_frame.to_csv(str(output_path), index=False)

# Extra part of the notebook

## Combining all the information

Storing the connections for each constellation in separate files is not necessary. For instance, consider the constellation **Crux**, which comprises four stars and two connections. Take the following sequence as an example:

$$1\;2\;3\;4$$

While these are not the actual star numbers used in the **Crux** constellation, this sequence illustrates that discontinuities are permissible. Star **1** is paired with **2**, and **3** is paired with **4**. The relative positions of stars **1** and **2** are perpendicular to stars **3** and **4**, resulting in a cross-like pattern when the lines are drawn.

Thus it is possible to combine two different constellations even if they are discontinuous.

The following cell will do that.

In [39]:
ra_list = []
dec_list = []

for constellation_file in hyg_constellation_dir.iterdir():
    constellation_frame = pd.read_csv(constellation_file)

    for index, row in constellation_frame.iterrows():
        ra_list.append(row["ra"])
        dec_list.append(row["dec"])

In [40]:
entry_list = []

twice_pairs = len(ra_list)
count = 0

while count < twice_pairs:

    start_RA = float(ra_list[count])
    end_RA = float(ra_list[count+1])

    start_Dec = float(dec_list[count])
    end_Dec = float(dec_list[count+1])

    x1 = np.cos(np.radians(start_Dec)) * np.cos(np.radians(start_RA*15))
    y1 = np.cos(np.radians(start_Dec)) * np.sin(np.radians(start_RA*15))
    z1 = np.sin(np.radians(start_Dec))

    x2 = np.cos(np.radians(end_Dec)) * np.cos(np.radians(end_RA*15))
    y2 = np.cos(np.radians(end_Dec)) * np.sin(np.radians(end_RA*15))
    z2 = np.sin(np.radians(end_Dec))

    new_ra_list, new_dec_list = compute_connect(x1, y1, z1, x2, y2, z2)

    new_entry = {'Connection': count/2, 'RA Points': str(
        new_ra_list), "Dec Points": str(new_dec_list)}

    entry_list.append(new_entry)

    count += 2

columns = ["Connection", "RA Points", "Dec Points"]

connection_frame = pd.DataFrame(columns=columns)

for entry in entry_list:
    connection_frame = pd.concat([connection_frame, pd.DataFrame(
        [entry], columns=columns)], ignore_index=True)

output_path = data_process_dir / f"Singular.csv"
connection_frame.to_csv(str(output_path), index=False)

  connection_frame = pd.concat([connection_frame, pd.DataFrame(


## Extracting positions of the unique stars

The following cells will extract the position of the unique stars. The purpose of this information will be detailed in the `map_create.ipynb` notebook

In [41]:
star_tuple_list = []

count = 0

while count < len(ra_list):

    star_tuple = (ra_list[count]*15, dec_list[count])
    star_tuple_list.append(star_tuple)

    count += 1

print(f"Tuples generated: {len(star_tuple_list)}")
star_tuple_set = set(star_tuple_list)
print(f"Unique tuples: {len(star_tuple_set)}")

Tuples generated: 1346
Unique tuples: 691


In [42]:
entry_list = []

for star_tuple in star_tuple_set:

    RA_entry = star_tuple[0]
    Dec_entry = star_tuple[1]

    new_entry = {'RA': RA_entry, "Dec": Dec_entry}
    entry_list.append(new_entry)

The positions of the unique stars are stored in the `Constellation_Star_Position_List.csv` file

In [43]:
%%capture
columns = ["RA", "Dec"]
new_frame = pd.DataFrame(columns=columns)

for entry in entry_list:
    new_frame = pd.concat([new_frame, pd.DataFrame(
        [entry], columns=columns)], ignore_index=True)

output_path = data_process_dir / "Constellation_Star_Position_List.csv"
new_frame.to_csv(str(output_path), index=False)