### Q1(30 point): Implement a class for n-sided polygons and a class for points in a Euclidean system, namely polygon and point respectively. For example, a 4-sided polygon can be defined by 4 points P1, P2, P3, P4, and P1-P4 are each points of the form point(X,Y), and X and Y are coordinates on the X and Y axis, respectively. The edges are listed counterclockwise starting at the lower left: P1 to P2, P2 to P3, P3 to P4, and P4 to P1. The polygon class should work for polygons of any number of edges and have a function perimeter that returns its perimeter (sum of the lengths of the edges). (20points)

In [None]:
import numpy as np
import pandas as pd
import math

class Point:
    def __init__(self, x, y):
        self.x = x
        self.y = y

class Polygon:
    def __init__(self, points):
        self.points = points

    def edge_length(self, p1, p2):
        #uses pythagorean theorem to calculate the length
        return math.sqrt((p1.x - p2.x) ** 2 + (p1.y - p2.y) ** 2)

    def perimeter(self):
    
        num_points = len(self.points)
        total_length = 0
        # loop to find the perimeter by summing all the edges
        for i in range(num_points):
            total_length += self.edge_length(self.points[i], self.points[(i + 1) % num_points])

        return round(total_length, 1)

point1 = Point(1, 1)
point2 = Point(1, 2)
point3 = Point(2, 2)

triangle = Polygon([point1, point2, point3])

point4 = Point(2, 1)
point5 = Point(2, 3)
point6 = Point(6, 3)
point7 = Point(4, 1)

four_sided = Polygon([point4, point5, point6, point7])

print(f"The perimeter of the triangle is {triangle.perimeter()}")
print(f"The perimeter of the foursided polygon is {four_sided.perimeter()}")


The perimeter of the triangle is 3.4
The perimeter of the foursided polygon is 10.8


### Use Pandas to load both data/AIS/transit_segments.csv, and data/AIS/vessel_information.csv. Show the first 5 rows of each dataset to inspect it.(10points)

In [4]:
transit_data = pd.read_csv('transit_segments.csv')
vessel_data = pd.read_csv('vessel_information.csv')

transit_data.head(5)

Unnamed: 0,mmsi,name,transit,segment,seg_length,avg_sog,min_sog,max_sog,pdgt10,st_time,end_time
0,1,Us Govt Ves,1,1,5.1,13.2,9.2,14.5,96.5,2/10/09 16:03,2/10/09 16:27
1,1,Dredge Capt Frank,1,1,13.5,18.6,10.4,20.6,100.0,4/6/09 14:31,4/6/09 15:20
2,1,Us Gov Vessel,1,1,4.3,16.2,10.3,20.5,100.0,4/6/09 14:36,4/6/09 14:55
3,1,Us Gov Vessel,2,1,9.2,15.4,14.5,16.1,100.0,4/10/09 17:58,4/10/09 18:34
4,1,Dredge Capt Frank,2,1,9.2,15.4,14.6,16.2,100.0,4/10/09 17:59,4/10/09 18:35


In [5]:
vessel_data.head(5)

Unnamed: 0,mmsi,num_names,names,sov,flag,flag_type,num_loas,loa,max_loa,num_types,type
0,1,8,Bil Holman Dredge/Dredge Capt Frank/Emo/Offsho...,Y,Unknown,Unknown,7,42.0/48.0/57.0/90.0/138.0/154.0/156.0,156.0,4,Dredging/MilOps/Reserved/Towing
1,9,3,000000009/Raven/Shearwater,N,Unknown,Unknown,2,50.0/62.0,62.0,2,Pleasure/Tug
2,21,1,Us Gov Vessel,Y,Unknown,Unknown,1,208.0,208.0,1,Unknown
3,74,2,Mcfaul/Sarah Bell,N,Unknown,Unknown,1,155.0,155.0,1,Unknown
4,103,3,Ron G/Us Navy Warship 103/Us Warship 103,Y,Unknown,Unknown,2,26.0/155.0,155.0,2,Tanker/Unknown


### For data/AIS/vessel_information.csv, keep only those rows with the type value occurring for at least 100 times in the datase

In [13]:
type_counts = vessel_data['type'].value_counts()

df_filtered = vessel_data[vessel_data['type'].isin(type_counts[type_counts >= 100].index)]

df_filtered

Unnamed: 0,mmsi,num_names,names,sov,flag,flag_type,num_loas,loa,max_loa,num_types,type
2,21,1,Us Gov Vessel,Y,Unknown,Unknown,1,208.0,208.0,1,Unknown
3,74,2,Mcfaul/Sarah Bell,N,Unknown,Unknown,1,155.0,155.0,1,Unknown
5,310,1,Arabella,N,Bermuda,Foreign,1,47.0,47.0,1,Unknown
6,3011,1,Charleston,N,Anguilla,Foreign,1,160.0,160.0,1,Other
7,4731,1,000004731,N,Yemen (Republic of),Foreign,1,30.0,30.0,1,Unknown
...,...,...,...,...,...,...,...,...,...,...,...
10762,866946820,1,Catherine Turecamo,N,Unknown,Unknown,2,0.0/33.0,33.0,1,Tug
10764,888888888,1,Earl Jones,N,Unknown,Unknown,1,40.0,40.0,1,Towing
10766,919191919,1,Oi,N,Unknown,Unknown,1,20.0,20.0,1,Pleasure
10768,975318642,1,Island Express,N,Unknown,Unknown,1,20.0,20.0,1,Towing


### Merge data/AIS/vessel_information.csv and data/AIS/transit_segments.csv on the "mmsi" column using outer join

In [15]:
merged_df = pd.merge(transit_data, vessel_data, on='mmsi', how='inner')

### If you are not allowed to call the inner join provided by Pandas but have the above outer join results, how to get the results of inner join? You can use other functions provided by Pandas (but not a function that directly implements the inner join). (10points)

In [21]:
inner_join1 = merged_df.dropna()
inner_join1

Unnamed: 0,mmsi,name,transit,segment,seg_length,avg_sog,min_sog,max_sog,pdgt10,st_time,...,num_names,names,sov,flag,flag_type,num_loas,loa,max_loa,num_types,type
0,1,Us Govt Ves,1,1,5.1,13.2,9.2,14.5,96.5,2/10/09 16:03,...,8,Bil Holman Dredge/Dredge Capt Frank/Emo/Offsho...,Y,Unknown,Unknown,7,42.0/48.0/57.0/90.0/138.0/154.0/156.0,156.0,4,Dredging/MilOps/Reserved/Towing
1,1,Dredge Capt Frank,1,1,13.5,18.6,10.4,20.6,100.0,4/6/09 14:31,...,8,Bil Holman Dredge/Dredge Capt Frank/Emo/Offsho...,Y,Unknown,Unknown,7,42.0/48.0/57.0/90.0/138.0/154.0/156.0,156.0,4,Dredging/MilOps/Reserved/Towing
2,1,Us Gov Vessel,1,1,4.3,16.2,10.3,20.5,100.0,4/6/09 14:36,...,8,Bil Holman Dredge/Dredge Capt Frank/Emo/Offsho...,Y,Unknown,Unknown,7,42.0/48.0/57.0/90.0/138.0/154.0/156.0,156.0,4,Dredging/MilOps/Reserved/Towing
3,1,Us Gov Vessel,2,1,9.2,15.4,14.5,16.1,100.0,4/10/09 17:58,...,8,Bil Holman Dredge/Dredge Capt Frank/Emo/Offsho...,Y,Unknown,Unknown,7,42.0/48.0/57.0/90.0/138.0/154.0/156.0,156.0,4,Dredging/MilOps/Reserved/Towing
4,1,Dredge Capt Frank,2,1,9.2,15.4,14.6,16.2,100.0,4/10/09 17:59,...,8,Bil Holman Dredge/Dredge Capt Frank/Emo/Offsho...,Y,Unknown,Unknown,7,42.0/48.0/57.0/90.0/138.0/154.0/156.0,156.0,4,Dredging/MilOps/Reserved/Towing
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
262348,999999999,Triple Attraction,3,1,5.3,20.0,19.6,20.4,100.0,6/15/10 12:49,...,1,Triple Attraction,N,Unknown,Unknown,1,30.0,30.0,1,Pleasure
262349,999999999,Triple Attraction,4,1,18.7,19.2,18.4,19.9,100.0,6/15/10 21:32,...,1,Triple Attraction,N,Unknown,Unknown,1,30.0,30.0,1,Pleasure
262350,999999999,Triple Attraction,6,1,17.4,17.0,14.7,18.4,100.0,6/17/10 19:16,...,1,Triple Attraction,N,Unknown,Unknown,1,30.0,30.0,1,Pleasure
262351,999999999,Triple Attraction,7,1,31.5,14.2,13.4,15.1,100.0,6/18/10 2:52,...,1,Triple Attraction,N,Unknown,Unknown,1,30.0,30.0,1,Pleasure


# Now directly call the inner join provided by Pandas, check whether your results above are exactly the same.(10points)

In [28]:
inner_join2 = pd.merge(transit_data, vessel_data, on='mmsi', how='inner')

In [27]:
are_equal = inner_join1.equals(inner_join2)
print(are_equal)

True
