# LIST OF CUSTOM OFFICES AT THE FRENCH POINTS OF ENTRY INTO THE EUROPEAN UNION (PORTS)

## *Scraping/Parsing/Cleaning/Reshaping/Translating data from a PDF*


The data will be extracted from a PDF containing [a list of the French points of entry into the European Union and their respective custom offices.](https://www.douane.gouv.fr/sites/default/files/uploads/files/Services-en-ligne/ICS/ics-liste-points-d-entree-francais.pdf)

As stated by the French Customs:

>* **"The names of the ports which constitute the first points of entry, or subsequent points, are included here, as well as the codes of the customs offices which are attached to them."**
* **"The maritime points of entry open to the office of entry-ENT competence experience a traffic subject to the obligations arising from ICS (Import Control System)"**

The data related to ports in the file `ics-liste-points-d-entree-francais.pdf` starts on page 0, ends on page 3, and the table on each page has headers.

The data will be extracted using the [`pdfplumber`](https://github.com/jsvine/pdfplumber) tool.

#### Steps:
1. Import dependencies
2. Open the PDF and check its structure
3. Create an empty data frame and specify the columns
4. Create a function to extract data from a single PDF page and return a data frame
5. Loop over the pages and call the function on each page
6. Find the last table and extract its data
7. Clean up and translate the data
8. Do one quick bit of basic analysis in pandas
9. Write the data to a CSV file

### 1. Import dependencies

In [1]:
import pdfplumber
import pandas as pd

In [2]:
# Set pandas display options (optional)
pd.set_option("display.max_columns", None)
pd.set_option("display.max_rows", None)
pd.set_option("display.max_colwidth", -1)

### 2. Open the PDF and check its structure
Using `pdfplumber`'s syntax to open a file, explore the PDF structure and extract the ports data table that starts on page 0.

In [3]:
with pdfplumber.open("ics-liste-points-d-entree-francais.pdf") as pdf:
    # print(pdf)
    #print(pdf.pages)
    test = pdf.pages[0]
    table = test.extract_table()
    print(table)

[["Nom du \npoint d'entrée", 'Nom du \nbureau', 'Code \nbureau', 'Trafic', 'Code\nport'], ['►Vecteur maritime (1)', None, None, None, None], ['Ajaccio', 'Ajaccio', 'FR000040', 'Maritime (P) \nRoutier (R)', 'FRAJA'], ['Bastia', 'Bastia port', 'FR000380', 'Maritime (P) \nRoutier (R)', 'FRBIA'], ['Bayonne', 'Bayonne', 'FR000390', 'Maritime (P) \nRoutier (R)', 'FRBAY'], ['Berre l’Étang (raffinerie)', 'Port-de-Bouc port', 'FR003620', 'Maritime (P) \nRoutier (R)', 'FRETB'], ['Bordeaux', 'Bassens', 'FR000610', 'Maritime (P) \nRoutier (R)', 'FRBAS\nFRBSE'], ['Boulogne-sur-Mer', 'Boulogne', 'FR000630', 'Maritime (P) \nRoutier (R)', 'FRBOL'], ['Brest', 'Brest', 'FR000690', 'Maritime (P) \nRoutier (R)', 'FRBES'], ['Caen Ouistreham', 'Caen', 'FR000720', 'Maritime (P) \nRoutier (R)', 'FRCFR'], ['Calais', 'Boulogne', 'FR000630', 'Maritime (P) \nRoutier (R)', 'FRCQF'], ['Cannes', 'Cannes', 'FR000800', 'Maritime (P) \nRoutier (R)', 'FRCEQ'], ['Cherbourg', 'Caen', 'FR000720', 'Maritime (P) \nRoutier (R

### 3. Create an empty data frame and define the columns

In [4]:
cols = ["point_of_entry_name", "office_name", "office_code", "traffic", "port_code"]

df = pd.DataFrame(columns=cols)

### 4. Create a function to extract data from a single PDF page
This function will be called on every PDF page we hand it. It will take a `pdfplumber.Page` object, extract the table and return the data in a data frame with the same headers as the empty one we just created.

In [5]:
def page_to_df(page):
    
    # Find the table on the page and extract the data
    table = page.extract_table()
    
    # Grab all rows in the table except for the first one,
    # which is the header row
    lines = table[1:]
    
    # Return the data in a data frame
    return pd.DataFrame(lines, columns=cols)

### 5. Loop over the pages and call the function on each page

As we extract the data from each page, we'll append the data frame returned by our function to the empty data frame (`df`) that we created earlier.

In [6]:
# Open the PDF
with pdfplumber.open("ics-liste-points-d-entree-francais.pdf") as pdf:
    
    # Select only the first 3 pages with the data table containing the list of maritime ports
    pages_with_data = pdf.pages[0:3]
    
    # Loop over the pages with data
    for page in pages_with_data:
        
        # Call the extraction function to grab the data from this page
        df_to_append = page_to_df(page)
        
        # Append it to our main dataframe, chopping off the index column
        df = df.append(df_to_append, ignore_index=True)

In [7]:
df

Unnamed: 0,point_of_entry_name,office_name,office_code,traffic,port_code
0,►Vecteur maritime (1),,,,
1,Ajaccio,Ajaccio,FR000040,Maritime (P) \nRoutier (R),FRAJA
2,Bastia,Bastia port,FR000380,Maritime (P) \nRoutier (R),FRBIA
3,Bayonne,Bayonne,FR000390,Maritime (P) \nRoutier (R),FRBAY
4,Berre l’Étang (raffinerie),Port-de-Bouc port,FR003620,Maritime (P) \nRoutier (R),FRETB
5,Bordeaux,Bassens,FR000610,Maritime (P) \nRoutier (R),FRBAS\nFRBSE
6,Boulogne-sur-Mer,Boulogne,FR000630,Maritime (P) \nRoutier (R),FRBOL
7,Brest,Brest,FR000690,Maritime (P) \nRoutier (R),FRBES
8,Caen Ouistreham,Caen,FR000720,Maritime (P) \nRoutier (R),FRCFR
9,Calais,Boulogne,FR000630,Maritime (P) \nRoutier (R),FRCQF


### 6. Find the last table and extract its data

The last table of the list of maritime ports lies on page 4 of the PDF, along with the first table of the list of airports which is built on a different format.

We need to only select page 4 then grab the data from the first table on that page.

In [8]:
# Select only page 4 ([3]) of the PDF containing the last table of the list of maritime ports
last_page = pdf.pages[3]

# Search for all tables on the page
tables = last_page.find_tables()
    
# Select only the first table on the page
last_page_data = tables[0].extract(x_tolerance = 5)

In [9]:
# Check that the data from the last table have been extracted
last_page_data

[["Nom du \npoint d'entrée",
  'Nom du \nbureau',
  'Code \nbureau',
  'Trafic',
  'Code\nport'],
 ['►Vecteur fluvio-maritime', None, None, None, None],
 ['Lyon port Edouard Herriot',
  'Lyon aéroport',
  'FR002650',
  'Maritime (P)\nFluvial (C)',
  'FRLIO'],
 ['Mâcon',
  'Chalon-sur-Saône',
  'FR000860',
  'Maritime (P)\nFluvial (C)',
  'FRMAC'],
 ['Salaise-sur-Sanne',
  "L'Isle d'Abeau",
  'FR002030',
  'Maritime (P)\nFluvial (C)',
  'FRSAL'],
 ['Valence',
  'Valence',
  'FR004550',
  'Maritime (P)\nFluvial (C)',
  'FRVAA\nFRVAC\nFRVAF'],
 ['Villefranche-sur-Saône',
  'Lyon aéroport',
  'FR002650',
  'Maritime (P)\nFluvial (C)',
  'FRVSS'],
 ['DOM', None, None, None, None],
 ['Degrad des Cannes (Guyane)',
  'Degrad des Cannes',
  'FR006370',
  'Maritime (P) \nFluvial (C)\nRoutier (R)',
  'GFCAY'],
 ['Saint-Laurent du Maroni (Guyane)',
  'Saint-Laurent du Maroni',
  'FR006390',
  'Maritime (P) \nFluvial (C)\nRoutier (R)(2)',
  'GFSLM'],
 ["Saint-Georges de l'Oyapock",
  "Saint-Georges

In [10]:
# Convert the data extracted inside `last_page_data` into a dataframe called `df2`
df2 = pd.DataFrame(last_page_data, columns=cols)

In [11]:
df2

Unnamed: 0,point_of_entry_name,office_name,office_code,traffic,port_code
0,Nom du \npoint d'entrée,Nom du \nbureau,Code \nbureau,Trafic,Code\nport
1,►Vecteur fluvio-maritime,,,,
2,Lyon port Edouard Herriot,Lyon aéroport,FR002650,Maritime (P)\nFluvial (C),FRLIO
3,Mâcon,Chalon-sur-Saône,FR000860,Maritime (P)\nFluvial (C),FRMAC
4,Salaise-sur-Sanne,L'Isle d'Abeau,FR002030,Maritime (P)\nFluvial (C),FRSAL
5,Valence,Valence,FR004550,Maritime (P)\nFluvial (C),FRVAA\nFRVAC\nFRVAF
6,Villefranche-sur-Saône,Lyon aéroport,FR002650,Maritime (P)\nFluvial (C),FRVSS
7,DOM,,,,
8,Degrad des Cannes (Guyane),Degrad des Cannes,FR006370,Maritime (P) \nFluvial (C)\nRoutier (R),GFCAY
9,Saint-Laurent du Maroni (Guyane),Saint-Laurent du Maroni,FR006390,Maritime (P) \nFluvial (C)\nRoutier (R)(2),GFSLM


In [12]:
# Append the last table of data (`df2`) to our main dataframe `df`
df = df.append(df2, ignore_index=True)

df

Unnamed: 0,point_of_entry_name,office_name,office_code,traffic,port_code
0,►Vecteur maritime (1),,,,
1,Ajaccio,Ajaccio,FR000040,Maritime (P) \nRoutier (R),FRAJA
2,Bastia,Bastia port,FR000380,Maritime (P) \nRoutier (R),FRBIA
3,Bayonne,Bayonne,FR000390,Maritime (P) \nRoutier (R),FRBAY
4,Berre l’Étang (raffinerie),Port-de-Bouc port,FR003620,Maritime (P) \nRoutier (R),FRETB
5,Bordeaux,Bassens,FR000610,Maritime (P) \nRoutier (R),FRBAS\nFRBSE
6,Boulogne-sur-Mer,Boulogne,FR000630,Maritime (P) \nRoutier (R),FRBOL
7,Brest,Brest,FR000690,Maritime (P) \nRoutier (R),FRBES
8,Caen Ouistreham,Caen,FR000720,Maritime (P) \nRoutier (R),FRCFR
9,Calais,Boulogne,FR000630,Maritime (P) \nRoutier (R),FRCQF


### 7. Clean up the data

In [13]:
# Kill line breaks
df.replace("\n", " ", inplace=True, regex=True)

Unnamed: 0,point_of_entry_name,office_name,office_code,traffic,port_code
0,►Vecteur maritime (1),,,,
1,Ajaccio,Ajaccio,FR000040,Maritime (P) Routier (R),FRAJA
2,Bastia,Bastia port,FR000380,Maritime (P) Routier (R),FRBIA
3,Bayonne,Bayonne,FR000390,Maritime (P) Routier (R),FRBAY
4,Berre l’Étang (raffinerie),Port-de-Bouc port,FR003620,Maritime (P) Routier (R),FRETB
5,Bordeaux,Bassens,FR000610,Maritime (P) Routier (R),FRBAS FRBSE
6,Boulogne-sur-Mer,Boulogne,FR000630,Maritime (P) Routier (R),FRBOL
7,Brest,Brest,FR000690,Maritime (P) Routier (R),FRBES
8,Caen Ouistreham,Caen,FR000720,Maritime (P) Routier (R),FRCFR
9,Calais,Boulogne,FR000630,Maritime (P) Routier (R),FRCQF


In [14]:
# Make all values uppercase and strip whitespace in all columns
df["point_of_entry_name"] = df.point_of_entry_name.str.upper().str.strip()
df["office_name"] = df.office_name.str.upper().str.strip()
df["office_code"] = df.office_code.str.strip()
df["traffic"] = df.traffic.str.upper().str.strip()
df["port_code"] = df.port_code.str.strip()

Unnamed: 0,point_of_entry_name,office_name,office_code,traffic,port_code
0,►VECTEUR MARITIME (1),,,,
1,AJACCIO,AJACCIO,FR000040,MARITIME (P) ROUTIER (R),FRAJA
2,BASTIA,BASTIA PORT,FR000380,MARITIME (P) ROUTIER (R),FRBIA
3,BAYONNE,BAYONNE,FR000390,MARITIME (P) ROUTIER (R),FRBAY
4,BERRE L’ÉTANG (RAFFINERIE),PORT-DE-BOUC PORT,FR003620,MARITIME (P) ROUTIER (R),FRETB
5,BORDEAUX,BASSENS,FR000610,MARITIME (P) ROUTIER (R),FRBAS FRBSE
6,BOULOGNE-SUR-MER,BOULOGNE,FR000630,MARITIME (P) ROUTIER (R),FRBOL
7,BREST,BREST,FR000690,MARITIME (P) ROUTIER (R),FRBES
8,CAEN OUISTREHAM,CAEN,FR000720,MARITIME (P) ROUTIER (R),FRCFR
9,CALAIS,BOULOGNE,FR000630,MARITIME (P) ROUTIER (R),FRCQF


In [15]:
# Remove all French accents from strings in columns "point_of_entry_name", "office_name" and "traffic"
df["point_of_entry_name"] = df.point_of_entry_name.str.normalize("NFKD") \
                                                  .str.encode("ascii", errors="ignore").str.decode("utf-8")
df["office_name"] = df.office_name.str.normalize("NFKD").str.encode("ascii", errors="ignore").str.decode("utf-8")
df["traffic"] = df.traffic.str.normalize("NFKD").str.encode("ascii", errors="ignore").str.decode("utf-8")

Unnamed: 0,point_of_entry_name,office_name,office_code,traffic,port_code
0,VECTEUR MARITIME (1),,,,
1,AJACCIO,AJACCIO,FR000040,MARITIME (P) ROUTIER (R),FRAJA
2,BASTIA,BASTIA PORT,FR000380,MARITIME (P) ROUTIER (R),FRBIA
3,BAYONNE,BAYONNE,FR000390,MARITIME (P) ROUTIER (R),FRBAY
4,BERRE LETANG (RAFFINERIE),PORT-DE-BOUC PORT,FR003620,MARITIME (P) ROUTIER (R),FRETB
5,BORDEAUX,BASSENS,FR000610,MARITIME (P) ROUTIER (R),FRBAS FRBSE
6,BOULOGNE-SUR-MER,BOULOGNE,FR000630,MARITIME (P) ROUTIER (R),FRBOL
7,BREST,BREST,FR000690,MARITIME (P) ROUTIER (R),FRBES
8,CAEN OUISTREHAM,CAEN,FR000720,MARITIME (P) ROUTIER (R),FRCFR
9,CALAIS,BOULOGNE,FR000630,MARITIME (P) ROUTIER (R),FRCQF


In [16]:
# Select indexes where vector values are equals to "Vecteur maritime"
# Assign "SEA PORT" value to entries in new column "port_type"
df.loc[0:58, "port_type"] = "SEA PORT"

Unnamed: 0,point_of_entry_name,office_name,office_code,traffic,port_code,port_type
0,VECTEUR MARITIME (1),,,,,SEA PORT
1,AJACCIO,AJACCIO,FR000040,MARITIME (P) ROUTIER (R),FRAJA,SEA PORT
2,BASTIA,BASTIA PORT,FR000380,MARITIME (P) ROUTIER (R),FRBIA,SEA PORT
3,BAYONNE,BAYONNE,FR000390,MARITIME (P) ROUTIER (R),FRBAY,SEA PORT
4,BERRE LETANG (RAFFINERIE),PORT-DE-BOUC PORT,FR003620,MARITIME (P) ROUTIER (R),FRETB,SEA PORT
5,BORDEAUX,BASSENS,FR000610,MARITIME (P) ROUTIER (R),FRBAS FRBSE,SEA PORT
6,BOULOGNE-SUR-MER,BOULOGNE,FR000630,MARITIME (P) ROUTIER (R),FRBOL,SEA PORT
7,BREST,BREST,FR000690,MARITIME (P) ROUTIER (R),FRBES,SEA PORT
8,CAEN OUISTREHAM,CAEN,FR000720,MARITIME (P) ROUTIER (R),FRCFR,SEA PORT
9,CALAIS,BOULOGNE,FR000630,MARITIME (P) ROUTIER (R),FRCQF,SEA PORT


In [17]:
# Select indexes where vector values are equals to "Vecteur fluvio-maritime"
# Assign "SEA AND RIVER PORT" value to entries in column "port_type"
df.loc[59:74, "port_type"] = "SEA AND RIVER PORT"

Unnamed: 0,point_of_entry_name,office_name,office_code,traffic,port_code,port_type
0,VECTEUR MARITIME (1),,,,,SEA PORT
1,AJACCIO,AJACCIO,FR000040,MARITIME (P) ROUTIER (R),FRAJA,SEA PORT
2,BASTIA,BASTIA PORT,FR000380,MARITIME (P) ROUTIER (R),FRBIA,SEA PORT
3,BAYONNE,BAYONNE,FR000390,MARITIME (P) ROUTIER (R),FRBAY,SEA PORT
4,BERRE LETANG (RAFFINERIE),PORT-DE-BOUC PORT,FR003620,MARITIME (P) ROUTIER (R),FRETB,SEA PORT
5,BORDEAUX,BASSENS,FR000610,MARITIME (P) ROUTIER (R),FRBAS FRBSE,SEA PORT
6,BOULOGNE-SUR-MER,BOULOGNE,FR000630,MARITIME (P) ROUTIER (R),FRBOL,SEA PORT
7,BREST,BREST,FR000690,MARITIME (P) ROUTIER (R),FRBES,SEA PORT
8,CAEN OUISTREHAM,CAEN,FR000720,MARITIME (P) ROUTIER (R),FRCFR,SEA PORT
9,CALAIS,BOULOGNE,FR000630,MARITIME (P) ROUTIER (R),FRCQF,SEA PORT


In [18]:
# Select indexes where territory values are equals to "Metropole"
# Assign "MAINLAND" value to entries in new column "territory"
df.loc[0:47, "territory"] = "MAINLAND"
df.loc[59:69, "territory"] = "MAINLAND"

Unnamed: 0,point_of_entry_name,office_name,office_code,traffic,port_code,port_type,territory
0,VECTEUR MARITIME (1),,,,,SEA PORT,MAINLAND
1,AJACCIO,AJACCIO,FR000040,MARITIME (P) ROUTIER (R),FRAJA,SEA PORT,MAINLAND
2,BASTIA,BASTIA PORT,FR000380,MARITIME (P) ROUTIER (R),FRBIA,SEA PORT,MAINLAND
3,BAYONNE,BAYONNE,FR000390,MARITIME (P) ROUTIER (R),FRBAY,SEA PORT,MAINLAND
4,BERRE LETANG (RAFFINERIE),PORT-DE-BOUC PORT,FR003620,MARITIME (P) ROUTIER (R),FRETB,SEA PORT,MAINLAND
5,BORDEAUX,BASSENS,FR000610,MARITIME (P) ROUTIER (R),FRBAS FRBSE,SEA PORT,MAINLAND
6,BOULOGNE-SUR-MER,BOULOGNE,FR000630,MARITIME (P) ROUTIER (R),FRBOL,SEA PORT,MAINLAND
7,BREST,BREST,FR000690,MARITIME (P) ROUTIER (R),FRBES,SEA PORT,MAINLAND
8,CAEN OUISTREHAM,CAEN,FR000720,MARITIME (P) ROUTIER (R),FRCFR,SEA PORT,MAINLAND
9,CALAIS,BOULOGNE,FR000630,MARITIME (P) ROUTIER (R),FRCQF,SEA PORT,MAINLAND


In [19]:
# Select indexes where territory values are equals to "DOM"
# Assign "OVERSEAS" value to entries in column "territory"
df.loc[48:58, "territory"] = "OVERSEAS"
df.loc[70:74, "territory"] = "OVERSEAS"

Unnamed: 0,point_of_entry_name,office_name,office_code,traffic,port_code,port_type,territory
0,VECTEUR MARITIME (1),,,,,SEA PORT,MAINLAND
1,AJACCIO,AJACCIO,FR000040,MARITIME (P) ROUTIER (R),FRAJA,SEA PORT,MAINLAND
2,BASTIA,BASTIA PORT,FR000380,MARITIME (P) ROUTIER (R),FRBIA,SEA PORT,MAINLAND
3,BAYONNE,BAYONNE,FR000390,MARITIME (P) ROUTIER (R),FRBAY,SEA PORT,MAINLAND
4,BERRE LETANG (RAFFINERIE),PORT-DE-BOUC PORT,FR003620,MARITIME (P) ROUTIER (R),FRETB,SEA PORT,MAINLAND
5,BORDEAUX,BASSENS,FR000610,MARITIME (P) ROUTIER (R),FRBAS FRBSE,SEA PORT,MAINLAND
6,BOULOGNE-SUR-MER,BOULOGNE,FR000630,MARITIME (P) ROUTIER (R),FRBOL,SEA PORT,MAINLAND
7,BREST,BREST,FR000690,MARITIME (P) ROUTIER (R),FRBES,SEA PORT,MAINLAND
8,CAEN OUISTREHAM,CAEN,FR000720,MARITIME (P) ROUTIER (R),FRCFR,SEA PORT,MAINLAND
9,CALAIS,BOULOGNE,FR000630,MARITIME (P) ROUTIER (R),FRCQF,SEA PORT,MAINLAND


In [20]:
# Remove all indexes containing vector and territory headers
# Reset index
df = df.drop([0, 17, 40, 48, 59, 63, 64, 70], axis=0).reset_index(drop=True)

Unnamed: 0,point_of_entry_name,office_name,office_code,traffic,port_code,port_type,territory
0,AJACCIO,AJACCIO,FR000040,MARITIME (P) ROUTIER (R),FRAJA,SEA PORT,MAINLAND
1,BASTIA,BASTIA PORT,FR000380,MARITIME (P) ROUTIER (R),FRBIA,SEA PORT,MAINLAND
2,BAYONNE,BAYONNE,FR000390,MARITIME (P) ROUTIER (R),FRBAY,SEA PORT,MAINLAND
3,BERRE LETANG (RAFFINERIE),PORT-DE-BOUC PORT,FR003620,MARITIME (P) ROUTIER (R),FRETB,SEA PORT,MAINLAND
4,BORDEAUX,BASSENS,FR000610,MARITIME (P) ROUTIER (R),FRBAS FRBSE,SEA PORT,MAINLAND
5,BOULOGNE-SUR-MER,BOULOGNE,FR000630,MARITIME (P) ROUTIER (R),FRBOL,SEA PORT,MAINLAND
6,BREST,BREST,FR000690,MARITIME (P) ROUTIER (R),FRBES,SEA PORT,MAINLAND
7,CAEN OUISTREHAM,CAEN,FR000720,MARITIME (P) ROUTIER (R),FRCFR,SEA PORT,MAINLAND
8,CALAIS,BOULOGNE,FR000630,MARITIME (P) ROUTIER (R),FRCQF,SEA PORT,MAINLAND
9,CANNES,CANNES,FR000800,MARITIME (P) ROUTIER (R),FRCEQ,SEA PORT,MAINLAND


In [21]:
# Replace all unique values in column "traffic" with their English translations
sorted(df.traffic.unique())

['MARITIME (P)  FLUVIAL (C) ROUTIER (R)',
 'MARITIME (P)  FLUVIAL (C) ROUTIER (R)(2)',
 'MARITIME (P)  ROUTIER (R)',
 'MARITIME (P) FLUVIAL (C)',
 'MARITIME (P) FLUVIAL (C) ROUTIER (R)',
 'MARITIME (P) ROUTIER (R)']

In [22]:
df["traffic"] = df.traffic.replace("MARITIME (P)  FLUVIAL (C) ROUTIER (R)", "SEA (P) RIVER (C) ROAD (R)") \
                          .replace("MARITIME (P) FLUVIAL (C) ROUTIER (R)", "SEA (P) RIVER (C) ROAD (R)") \
                          .replace("MARITIME (P)  FLUVIAL (C) ROUTIER (R)(2)", "SEA (P) RIVER (C) ROAD (R)(2)") \
                          .replace("MARITIME (P) FLUVIAL (C)", "SEA (P) RIVER (C)") \
                          .replace("MARITIME (P)  ROUTIER (R)", "SEA (P) ROAD (R)") \
                          .replace("MARITIME (P) ROUTIER (R)", "SEA (P) ROAD (R)")


sorted(df.traffic.unique())

['SEA (P) RIVER (C)',
 'SEA (P) RIVER (C) ROAD (R)',
 'SEA (P) RIVER (C) ROAD (R)(2)',
 'SEA (P) ROAD (R)']

### 8. Do some basic analysis

Make a quick analysis and write _an entire journalism sentence_ reporting the number of sea ports located in Mainland France that are first points of entry into the European Union as well as the percentage of the total number of ports in the data frame it represents.

In [24]:
# What is the total number of ports (how many records are there?)
record_count = len(df)

# Filter for only sea ports in French overseas territories
sea_ports = df[(df.port_type == "SEA PORT") & (df.territory == "OVERSEAS")]

# How many of those are there?
sea_ports_count = len(sea_ports)

# Calculate the percentage of the whole
pct_whole = (sea_ports_count / record_count) * 100

Formulate a journalistic sentence.

In [25]:
# Write out a formatted sentence using an f-string
story_sentence = f'Of the {record_count:,} French sea ports which constitute the first points of entry into the European Union, {sea_ports_count:,} ({pct_whole:0.2f}%) are sea ports that are located in French overseas territories.'

print(story_sentence)

Of the 66 French sea ports which constitute the first points of entry into the European Union, 10 (15.15%) are sea ports that are located in French overseas territories.


### 9. Write the data to a CSV file

In [26]:
df.to_csv("list-of-french-points-of-entry-in-ue-ports.csv", sep=",", encoding="utf-8", index=False)

In [27]:
data = pd.read_csv("list-of-french-points-of-entry-in-ue-ports.csv")

data

Unnamed: 0,point_of_entry_name,office_name,office_code,traffic,port_code,port_type,territory
0,AJACCIO,AJACCIO,FR000040,SEA (P) ROAD (R),FRAJA,SEA PORT,MAINLAND
1,BASTIA,BASTIA PORT,FR000380,SEA (P) ROAD (R),FRBIA,SEA PORT,MAINLAND
2,BAYONNE,BAYONNE,FR000390,SEA (P) ROAD (R),FRBAY,SEA PORT,MAINLAND
3,BERRE LETANG (RAFFINERIE),PORT-DE-BOUC PORT,FR003620,SEA (P) ROAD (R),FRETB,SEA PORT,MAINLAND
4,BORDEAUX,BASSENS,FR000610,SEA (P) ROAD (R),FRBAS FRBSE,SEA PORT,MAINLAND
5,BOULOGNE-SUR-MER,BOULOGNE,FR000630,SEA (P) ROAD (R),FRBOL,SEA PORT,MAINLAND
6,BREST,BREST,FR000690,SEA (P) ROAD (R),FRBES,SEA PORT,MAINLAND
7,CAEN OUISTREHAM,CAEN,FR000720,SEA (P) ROAD (R),FRCFR,SEA PORT,MAINLAND
8,CALAIS,BOULOGNE,FR000630,SEA (P) ROAD (R),FRCQF,SEA PORT,MAINLAND
9,CANNES,CANNES,FR000800,SEA (P) ROAD (R),FRCEQ,SEA PORT,MAINLAND
