# Homework 4: Data Visualization
- **Name**:  Sunil Uthukota
- **UB Username**: suniluth
- **UB Person Number**:  50468252

--- 
## Part 1 - Generate Plots According to Specifications

### Problem 1 - Scatter Plot with a Line

In [None]:
import matplotlib.pyplot as plt
import numpy as np
import matplotlib.patches

data = np.genfromtxt("scatter_data.csv", delimiter=",", comments="%")
x = data[:, 0]
y = data[:, 1]
leftmost_point = x.min()
rightmost_point = x.max()

line_x = np.array([leftmost_point, rightmost_point])
line_y = np.array([y[np.where(x == leftmost_point)[0][0]], y[np.where(x == rightmost_point)[0][0]]])

plt.scatter(x, y, marker="^", color="green", label=" Observations")

plt.plot(line_x, line_y, color="red", linestyle="--", label="Extreme x points", zorder=0)

plt.legend(loc="upper left")
plt.title("Widget Measurements")
plt.xlabel("x [inches]")
plt.ylabel("y [inches]")
plt.show()

### Problem 2 - Histogram

In [None]:

data = np.genfromtxt("student_grades.csv", delimiter=",", comments="%")
student_grades = data[:, 1]


bin_edges = [0, 60, 70, 80, 90, 100]
letter_grades = ['F', 'D', 'C', 'B', 'A']


n, bins, patches = plt.hist(student_grades, bins=bin_edges, color="orange", edgecolor="black", rwidth=0.8)

plt.title("Grade Distribution")
plt.xlabel("Grade")
plt.ylabel("Count")


ax = plt.gca()


for bar, grade in zip(patches, letter_grades):
    height = int(bar.get_height())
    ax.text(bar.get_x() + bar.get_width() / 2, height, str(height), ha='center', va='bottom', fontsize=12)

bin_centers = [0.5 * (bin_edges[i] + bin_edges[i + 1]) for i in range(len(bin_edges) - 1)]
plt.xticks(bin_centers, letter_grades)


plt.show()


### Problem 3 - Barplot and Boxplot in the Same Figure

In [None]:
import pandas as pd

data = pd.read_csv("solution_data.csv", header=None, comment="%")

#first subplot
colors = {
    "optimal": "orange",
    "genetic algorithm": "orange",
    "simulated annealing": "orange",
    "tabu search": "orange",
}

max_value = data[2].max()


data["Optimality Gap"] = (max_value - data[2]) / max_value * 100


filtered_data = data[data[1].isin(colors.keys())]


average_gaps = filtered_data.groupby(1)["Optimality Gap"].mean()

fig, axs = plt.subplots(1, 2, figsize=(12, 6))

bars = axs[0].bar(average_gaps.index, average_gaps, color=colors.values(), edgecolor="black")
axs[0].set_xticks(range(len(average_gaps.index)))
axs[0].set_xticklabels(average_gaps.index)
axs[0].set_xlabel("Heuristic")
axs[0].set_ylabel("Optimality Gap (%)")
axs[0].set_title("Mean Gaps",fontsize=8)

#Second subplot

boxplot_data = [filtered_data[filtered_data[1] == heuristic]["Optimality Gap"] for heuristic in colors.keys()]
medianprops = dict(color="orange")
box = axs[1].boxplot(boxplot_data, labels=colors.keys(), patch_artist=True, medianprops=medianprops)
axs[1].set_xlabel("Heuristic")

axs[1].set_title("Distribution Gaps",fontsize=8)

axs[0].set_ylim([0, max_value])
axs[1].set_ylim([0, max_value])

plt.tight_layout()

plt.suptitle("Comparison of Optimality Gaps for Heuristics", fontsize=10,y=1.05)
plt.show()


--- 
## Part 2 - Explore New Plot Types


Area Plot:

The plot I am going to create is an area plot. It will show the total number of Russian personnel losses over time. I believe that an area plot is a wise choice for this type of data because it is well-suited for showing trends over time. The area plot will allow the viewer to see how the number of Russian personnel losses has changed over the course of the war.
Type of data: Time series data
Area plots are well-suited for showing trends over time. They are easy to read and interpret, and they can be used to compare trends across multiple groups.

Hexbin Plot:

Type of data: Bivariate data
Hexbin plots are well-suited for visualizing the relationship between two continuous variables. They are particularly useful for large datasets, as they can help to identify patterns and trends that would be difficult to see in a scatter plot.
In the case of the Russia-Ukraine war, a hexbin plot could be used to visualize the relationship between Russian personnel losses and prisoners of war (POWs). This plot would allow us to see if there is a correlation between the two variables, and if so, what the strength of the correlation is.

Scatter 3D Plot:

Type of data: Trivariate data
Scatter 3D plots are well-suited for visualizing the relationship between three continuous variables. They can be used to identify patterns and trends that would be difficult to see in a scatter plot or hexbin plot.
In the case of the Russia-Ukraine war, a scatter 3D plot could be used to visualize the relationship between Russian personnel losses, POWs, and the day of the war. This plot would allow us to see how the number of personnel losses and POWs has changed over the course of the war, and to identify any patterns or trends.

- **URL of Example Code**:  https://matplotlib.org/stable/gallery/mplot3d/scatter3d.html#sphx-glr-gallery-mplot3d-scatter3d-py ; https://matplotlib.org/stable/plot_types/stats/hexbin.html#sphx-glr-plot-types-stats-hexbin-py ; https://matplotlib.org/stable/gallery/lines_bars_and_markers/fill_between_alpha.html ;
- **URL of Sample Data**: https://www.kaggle.com/datasets/piterfm/2022-ukraine-russian-war/data

In [1]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
from ipywidgets import interact


df = pd.read_csv("russia_losses_personnel.csv")

In [2]:
####Area Plot

df.set_index('date')['personnel'].plot(kind='area', figsize=(10,6),color ='red')
plt.title('Personnel over Time')
plt.xlabel('Date')
plt.ylabel('Personnel')
plt.show()


In [3]:
####Hexbin Plot
df.plot(kind='hexbin', x='personnel', y='POW', gridsize=30, figsize=(10,6))
plt.title('Personnel vs POW')
plt.xlabel('Personnel')
plt.ylabel('POW')
plt.show()

In [None]:
##### 3D plot
import plotly.express as px

# Assuming that 'day', 'personnel', and 'POW' are columns in your DataFrame
fig = px.scatter_3d(df, x='day', y='personnel', z='POW',
                    color='day', size='personnel', opacity=0.9)

# Tight layout
fig.update_layout(margin=dict(l=0, r=0, b=0, t=0))

fig.show()


Area plot: This plot shows the total number of Russian personnel losses over time. The plot shows that the number of personnel losses has been steadily increasing since the start of the war.
Hexbin plot: This plot shows the relationship between Russian personnel losses and prisoners of war (POWs). The plot shows that there is a positive correlation between the two variables, meaning that as the number of personnel losses increases, the number of POWs also increases.
3D plot: This plot shows the relationship between Russian personnel losses, POWs, and the day of the war. The plot shows that the number of personnel losses and POWs has been increasing over time, and that the most losses have occurred in the early days of the war.

Conclusion:

Overall, the plots show that Russia has suffered significant personnel losses since the start of the war. The plots also show that the number of personnel losses is positively correlated with the number of POWs. This suggests that Russia is losing a significant number of soldiers to capture.


