## Question 1
Implement a class for n-sided polygons and a class for points in a Euclidean system, namely polygon and point respectively. 

For example, a 4-sided polygon can be defined by 4 points P1, P2, P3, P4, and P1-P4 are each points of the form point(X,Y), and X and Y are coordinates on the X and Y axis, respectively. 

The edges are listed counterclockwise starting at the lower left: P1 to P2, P2 to P3, P3 to P4, and P4 to P1. 

The polygon class should work for polygons of any number of edges and have a function perimeter that returns its perimeter (sum of the lengths of the edges). (20points)

Hint: use the Pythagorian theorem: if a line segment Z starts at (X1,Y1) and ends at (X2, Y2), the length of Z is the square root of (X1-X2)^2 + (Y1-Y2)^2.

Example: 
The perimeter of the polygon/triangle on point(1,1), point(1,2), and point(2,2) is 3.4; 
The perimeter of the 4-sided polygon on point(2,1), point(2,3), point(6,3), and point(4,1) is 10.8; (10points)

The length \(Z\) of the line segment is given by:

$$
Z = \sqrt{(X_2 - X_1)^2 + (Y_2 - Y_1)^2}
$$

In [83]:
import math

In [84]:
class Point:
    def __init__(self, x, y):
        self.x = x
        self.y = y
    
    def display(self):
        print(f"Point {self.x} to Point {self.y}")

In [88]:
class Polygon: 
    def __init__(self, points):
        if len(points) < 3:
            print(f"Polygon has a minimum of 3 points")
        self.points = points
    
    def length(self, p1, p2):
        return math.sqrt((p2.x - p1.x) ** 2 + (p2.y - p1.y) ** 2)
    
    def perimeter(self):
        n = len(self.points)
        perimeter = 0
        for i in range(n):
            perimeter += self.length(self.points[i], self.points[(i+1)%n])
        return perimeter

In [89]:
p1 = Point(1,1)
p2 = Point(1,2)
p3 = Point(2,2)
Triangle = Polygon([p1,p2,p3])
print("Perimeter of triangle:", Triangle.perimeter())

Perimeter of triangle: 3.414213562373095


In [87]:
p1 = Point(2,1)
p2 = Point(2,3)
p3 = Point(6,3)
p4 = Point(4,1)
Square = Polygon([p1,p2,p3,p4])
print("Perimeter of square:", Square.perimeter())

Perimeter of square: 10.82842712474619


## Question 2 
Use Pandas to load both data/AIS/transit_segments.csv, and data/AIS/vessel_information.csv. Show the first 5 rows of each dataset to inspect it.(10points)

For data/AIS/vessel_information.csv, keep only those rows with the type value occurring for at least 100 times in the dataset. (10points)

Merge data/AIS/vessel_information.csv and data/AIS/transit_segments.csv on the "mmsi" column using outer join. (10points)

If you are not allowed to call the inner join provided by Pandas but have the above outer join results, how to get the results of inner join? You can use other functions provided by Pandas (but not a function that directly implements the inner join). (10points)

Now directly call the inner join provided by Pandas, check whether your results above are exactly the same.(10points)

In [11]:
import numpy as np
import pandas as pd

transit_seg = pd.read_csv('AIS/transit_segments.csv')
vessel_info = pd.read_csv('AIS/vessel_information.csv')

### Transit Sements

In [62]:
transit_seg.head()

Unnamed: 0,mmsi,name,transit,segment,seg_length,avg_sog,min_sog,max_sog,pdgt10,st_time,end_time
0,1,Us Govt Ves,1,1,5.1,13.2,9.2,14.5,96.5,2/10/09 16:03,2/10/09 16:27
1,1,Dredge Capt Frank,1,1,13.5,18.6,10.4,20.6,100.0,4/6/09 14:31,4/6/09 15:20
2,1,Us Gov Vessel,1,1,4.3,16.2,10.3,20.5,100.0,4/6/09 14:36,4/6/09 14:55
3,1,Us Gov Vessel,2,1,9.2,15.4,14.5,16.1,100.0,4/10/09 17:58,4/10/09 18:34
4,1,Dredge Capt Frank,2,1,9.2,15.4,14.6,16.2,100.0,4/10/09 17:59,4/10/09 18:35


### Vessel Information

In [63]:
vessel_info.head()

Unnamed: 0,mmsi,num_names,names,sov,flag,flag_type,num_loas,loa,max_loa,num_types,type
0,1,8,Bil Holman Dredge/Dredge Capt Frank/Emo/Offsho...,Y,Unknown,Unknown,7,42.0/48.0/57.0/90.0/138.0/154.0/156.0,156.0,4,Dredging/MilOps/Reserved/Towing
1,9,3,000000009/Raven/Shearwater,N,Unknown,Unknown,2,50.0/62.0,62.0,2,Pleasure/Tug
2,21,1,Us Gov Vessel,Y,Unknown,Unknown,1,208.0,208.0,1,Unknown
3,74,2,Mcfaul/Sarah Bell,N,Unknown,Unknown,1,155.0,155.0,1,Unknown
4,103,3,Ron G/Us Navy Warship 103/Us Warship 103,Y,Unknown,Unknown,2,26.0/155.0,155.0,2,Tanker/Unknown


### Getting the vessels type count

In [57]:
type_count = vessel_info.type.value_counts()
type_count

Cargo                      5622
Tanker                     2440
Pleasure                    601
Tug                         221
Sailing                     205
                           ... 
AntiPol/Other                 1
Fishing/Law                   1
Cargo/Other/Towing            1
Cargo/Fishing                 1
Fishing/Reserved/Towing       1
Name: type, Length: 206, dtype: int64

### Vessels type that has a count of 100 or more

In [73]:
types_100 = [v for v in type_count.index if type_count[v] >= 100]
types_100

['Cargo',
 'Tanker',
 'Pleasure',
 'Tug',
 'Sailing',
 'Fishing',
 'Other',
 'Passenger',
 'Towing',
 'Unknown']

### The vessel info dataframe to contain only the ones with type count >= 100

In [61]:
vessel_info_filtered = vessel_info[vessel_info.type.isin(types_100)]
vessel_info_filtered

Unnamed: 0,mmsi,num_names,names,sov,flag,flag_type,num_loas,loa,max_loa,num_types,type
2,21,1,Us Gov Vessel,Y,Unknown,Unknown,1,208.0,208.0,1,Unknown
3,74,2,Mcfaul/Sarah Bell,N,Unknown,Unknown,1,155.0,155.0,1,Unknown
5,310,1,Arabella,N,Bermuda,Foreign,1,47.0,47.0,1,Unknown
6,3011,1,Charleston,N,Anguilla,Foreign,1,160.0,160.0,1,Other
7,4731,1,000004731,N,Yemen (Republic of),Foreign,1,30.0,30.0,1,Unknown
...,...,...,...,...,...,...,...,...,...,...,...
10762,866946820,1,Catherine Turecamo,N,Unknown,Unknown,2,0.0/33.0,33.0,1,Tug
10764,888888888,1,Earl Jones,N,Unknown,Unknown,1,40.0,40.0,1,Towing
10766,919191919,1,Oi,N,Unknown,Unknown,1,20.0,20.0,1,Pleasure
10768,975318642,1,Island Express,N,Unknown,Unknown,1,20.0,20.0,1,Towing


### Merging filtered vessel_info and transit_segments on column mmsi using outer join

In [67]:
outer_merged_data  = pd.merge(vessel_info_filtered, transit_seg, on='mmsi', how='outer')
outer_merged_data

Unnamed: 0,mmsi,num_names,names,sov,flag,flag_type,num_loas,loa,max_loa,num_types,...,name,transit,segment,seg_length,avg_sog,min_sog,max_sog,pdgt10,st_time,end_time
0,21,1.0,Us Gov Vessel,Y,Unknown,Unknown,1.0,208.0,208.0,1.0,...,Us Gov Vessel,2,1,48.7,6.6,3.4,16.3,38.4,3/14/11 16:13,3/15/11 0:02
1,21,1.0,Us Gov Vessel,Y,Unknown,Unknown,1.0,208.0,208.0,1.0,...,Us Gov Vessel,3,1,15.1,13.7,10.0,15.1,91.8,3/18/11 11:18,3/18/11 12:26
2,21,1.0,Us Gov Vessel,Y,Unknown,Unknown,1.0,208.0,208.0,1.0,...,Us Gov Vessel,4,1,18.0,9.7,4.6,15.2,76.3,4/25/11 16:37,4/25/11 18:25
3,21,1.0,Us Gov Vessel,Y,Unknown,Unknown,1.0,208.0,208.0,1.0,...,Us Gov Vessel,5,1,11.2,12.9,6.1,15.6,80.7,5/14/11 15:51,5/14/11 16:50
4,21,1.0,Us Gov Vessel,Y,Unknown,Unknown,1.0,208.0,208.0,1.0,...,Us Gov Vessel,6,1,5.8,16.5,15.1,17.4,100.0,5/19/11 12:34,5/19/11 12:56
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
262521,987654321,,,,,,,,,,...,Island Lookout,64,1,11.7,5.8,5.6,6.1,0.0,5/31/10 14:27,5/31/10 16:35
262522,987654321,,,,,,,,,,...,Island Lookout,65,1,11.4,5.5,5.1,6.0,0.0,6/5/10 5:25,6/5/10 7:32
262523,987654321,,,,,,,,,,...,Island Lookout,71,1,10.1,4.5,1.9,6.5,0.0,6/27/10 2:35,6/27/10 5:04
262524,987654321,,,,,,,,,,...,Island Lookout,73,1,7.6,5.3,5.1,5.5,0.0,7/1/10 3:49,7/1/10 5:15


### Getting inner join using outer join results

To get the inner join using outer join merged, we have to keep only the rows where both DataFrames have matching entries. I'm going to get the columns and then if any of them have a NaN I will drop that entry.

In [72]:
columns = list(outer_merged_data)
inner_join_using_outer = outer_merged_data.dropna(subset=columns)
inner_join_using_outer

Unnamed: 0,mmsi,num_names,names,sov,flag,flag_type,num_loas,loa,max_loa,num_types,...,name,transit,segment,seg_length,avg_sog,min_sog,max_sog,pdgt10,st_time,end_time
0,21,1.0,Us Gov Vessel,Y,Unknown,Unknown,1.0,208.0,208.0,1.0,...,Us Gov Vessel,2,1,48.7,6.6,3.4,16.3,38.4,3/14/11 16:13,3/15/11 0:02
1,21,1.0,Us Gov Vessel,Y,Unknown,Unknown,1.0,208.0,208.0,1.0,...,Us Gov Vessel,3,1,15.1,13.7,10.0,15.1,91.8,3/18/11 11:18,3/18/11 12:26
2,21,1.0,Us Gov Vessel,Y,Unknown,Unknown,1.0,208.0,208.0,1.0,...,Us Gov Vessel,4,1,18.0,9.7,4.6,15.2,76.3,4/25/11 16:37,4/25/11 18:25
3,21,1.0,Us Gov Vessel,Y,Unknown,Unknown,1.0,208.0,208.0,1.0,...,Us Gov Vessel,5,1,11.2,12.9,6.1,15.6,80.7,5/14/11 15:51,5/14/11 16:50
4,21,1.0,Us Gov Vessel,Y,Unknown,Unknown,1.0,208.0,208.0,1.0,...,Us Gov Vessel,6,1,5.8,16.5,15.1,17.4,100.0,5/19/11 12:34,5/19/11 12:56
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
197809,999999999,1.0,Triple Attraction,N,Unknown,Unknown,1.0,30.0,30.0,1.0,...,Triple Attraction,3,1,5.3,20.0,19.6,20.4,100.0,6/15/10 12:49,6/15/10 13:05
197810,999999999,1.0,Triple Attraction,N,Unknown,Unknown,1.0,30.0,30.0,1.0,...,Triple Attraction,4,1,18.7,19.2,18.4,19.9,100.0,6/15/10 21:32,6/15/10 22:29
197811,999999999,1.0,Triple Attraction,N,Unknown,Unknown,1.0,30.0,30.0,1.0,...,Triple Attraction,6,1,17.4,17.0,14.7,18.4,100.0,6/17/10 19:16,6/17/10 20:17
197812,999999999,1.0,Triple Attraction,N,Unknown,Unknown,1.0,30.0,30.0,1.0,...,Triple Attraction,7,1,31.5,14.2,13.4,15.1,100.0,6/18/10 2:52,6/18/10 5:03


### Inner join using pandas

Directly calling the inner join provided by pandas

In [66]:
direct_inner_join = pd.merge(vessel_info_filtered, transit_seg, on='mmsi', how='inner')
direct_inner_join

Unnamed: 0,mmsi,num_names,names,sov,flag,flag_type,num_loas,loa,max_loa,num_types,...,name,transit,segment,seg_length,avg_sog,min_sog,max_sog,pdgt10,st_time,end_time
0,21,1,Us Gov Vessel,Y,Unknown,Unknown,1,208.0,208.0,1,...,Us Gov Vessel,2,1,48.7,6.6,3.4,16.3,38.4,3/14/11 16:13,3/15/11 0:02
1,21,1,Us Gov Vessel,Y,Unknown,Unknown,1,208.0,208.0,1,...,Us Gov Vessel,3,1,15.1,13.7,10.0,15.1,91.8,3/18/11 11:18,3/18/11 12:26
2,21,1,Us Gov Vessel,Y,Unknown,Unknown,1,208.0,208.0,1,...,Us Gov Vessel,4,1,18.0,9.7,4.6,15.2,76.3,4/25/11 16:37,4/25/11 18:25
3,21,1,Us Gov Vessel,Y,Unknown,Unknown,1,208.0,208.0,1,...,Us Gov Vessel,5,1,11.2,12.9,6.1,15.6,80.7,5/14/11 15:51,5/14/11 16:50
4,21,1,Us Gov Vessel,Y,Unknown,Unknown,1,208.0,208.0,1,...,Us Gov Vessel,6,1,5.8,16.5,15.1,17.4,100.0,5/19/11 12:34,5/19/11 12:56
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
197809,999999999,1,Triple Attraction,N,Unknown,Unknown,1,30.0,30.0,1,...,Triple Attraction,3,1,5.3,20.0,19.6,20.4,100.0,6/15/10 12:49,6/15/10 13:05
197810,999999999,1,Triple Attraction,N,Unknown,Unknown,1,30.0,30.0,1,...,Triple Attraction,4,1,18.7,19.2,18.4,19.9,100.0,6/15/10 21:32,6/15/10 22:29
197811,999999999,1,Triple Attraction,N,Unknown,Unknown,1,30.0,30.0,1,...,Triple Attraction,6,1,17.4,17.0,14.7,18.4,100.0,6/17/10 19:16,6/17/10 20:17
197812,999999999,1,Triple Attraction,N,Unknown,Unknown,1,30.0,30.0,1,...,Triple Attraction,7,1,31.5,14.2,13.4,15.1,100.0,6/18/10 2:52,6/18/10 5:03


Both of my results are exactly the same.