# Script de redécoupage initial

Script utilisé dans le cadre du Cycle Data, pour la première séance, prenant en entrée la base de données des voyages transatlantiques téléchargées sur le site du projet [Slave Voyages](https://www.slavevoyages.org/voyage/database) en novembre 2022. Cette base de données tabulaire est conservée en tant que [../data/slave-voyages_trans-atlantic-db.csv](../data/slave-voyages_trans-atlantic-db.csv).

Les opérations effectués utilisent la bibliothèque Pandas et sont les suivants :
* import des données tabulaires,
* affichage des variables du tableau (têtes de colonnes),
* contrôle des 10 premières lignes,
* création d'un tableau contenant uniquement les entrées avec "Netherlands" comme valeur pour la variable "Flag of Vessel", et uniquement les colonnes "Flag of vessel", "Date vessel departed with captives", "Vessel name" et "Captain's name" avec export en CSV,
* création d'un tableau résumant la colonne "Flag of Vessel" avec export en CSV,
* contrôle des valeurs possibles pour la variable "Captain's name".

### Imports de librairies

In [1]:
import os
from datetime import datetime
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import csv
import re

### Import des données et contrôle de l'import

In [5]:
# Import des données.

slave_voyages_orig = pd.read_csv(
    "../data/slave-voyages_trans-atlantic-db.csv",
    skip_blank_lines=False,
    infer_datetime_format = True,
    na_filter=False
)

# Contrôle des variables disponibles.

for header in list(slave_voyages_orig):
    print(header)
    
# Contrôle des 10 premières lignes.

slave_voyages_orig.head(10)

Captain's name
Crew deaths during voyage
Crew at first landing of captives
Crew at voyage outset
Date vessel departed with captives
Date vessel departed for homeport
Display in compact mode
Date vessel arrived with captives
First place where captives were landed
First place where captives were purchased
Guns mounted
Cargo
Year of arrival at port of disembarkation
Voyage duration, homeport to disembarkation (in days)
Place where vessel's voyage began
Principal place where captives were purchased
Principal place where captives were landed
Total embarked
Total disembarked
Captive deaths during crossing
Percentage of captives who died during crossing
Captive Background
Flag of vessel
Percent boys
Percent children
Percent girls
Percent males
Percent men
Percent women
Sterling cash price in Jamaica
Duration of captives' crossing (in days)
Flag of vessel.1
Captives carried from 1st port
Captives carried from 2nd port
Captives carried from 3rd port
Captives landed at 1st port
Captives landed a

Unnamed: 0,Captain's name,Crew deaths during voyage,Crew at first landing of captives,Crew at voyage outset,Date vessel departed with captives,Date vessel departed for homeport,Display in compact mode,Date vessel arrived with captives,First place where captives were landed,First place where captives were purchased,...,Voyage ID,Year constructed,Voyage itinerary imputed port where began (ptdepimp) place,Voyage itinerary imputed principal place of slave purchase (mjbyptimp),Voyage itinerary imputed principal port of slave disembarkation (mjslptimp) place,Voyage links,Voyage ship place where vessel constructed,Voyage itinerary first place of slave purchase (plac1tra),Voyage itinerary first place of slave landing (sla1port),Voyage ship place where vessel registered
0,"Velde, Daniel ter",,,54.0,1732-09-03T00:00:00Z,1733-04-24T00:00:00Z,,1732-11-18T00:00:00Z,,,...,10353,,Texel,Elmina,"Suriname, place unspecified",,,Elmina,"Suriname, place unspecified",
1,"Hoeven, Jan van der<br/> Wenman, Roelof",,,23.0,1706-09-09T00:00:00Z,,,1706-11-30T00:00:00Z,,,...,10354,,Hellevoetsluis,Elmina,Curaçao,,,Elmina,Curaçao,
2,,,,,,,,,,,...,10355,,,Ardra,,,,Ardra,,
3,"Scheij, Pieter",,,,,1688-01-01T00:00:00Z,,1688-01-01T00:00:00Z,,,...,10356,,Texel,"Gold Coast, port unspecified",Curaçao,,,,Curaçao,
4,"Stoop, Pieter<br/> Crans, Pieter",,,,1700-12-15T00:00:00Z,1701-01-01T00:00:00Z,,1701-06-23T00:00:00Z,,,...,10357,,Hellevoetsluis,"Whydah, Ouidah",Curaçao,,,"Whydah, Ouidah",Curaçao,
5,"Crans, Pieter",,,,,1703-01-01T00:00:00Z,,1702-11-29T00:00:00Z,,,...,10358,,Hellevoetsluis,"Whydah, Ouidah","Suriname, place unspecified",,,"Whydah, Ouidah","Suriname, place unspecified",
6,"Leijm, Lodewijk van der",,,,,1705-07-25T00:00:00Z,,1705-05-10T00:00:00Z,,,...,10359,,Hellevoetsluis,Malembo,Curaçao,,,Malembo,Curaçao,
7,"Banckert, Joost",,,,1686-02-14T00:00:00Z,1686-01-01T00:00:00Z,,1686-01-01T00:00:00Z,,,...,10360,,Zeeland,Ardra,Curaçao,,,Ardra,Curaçao,
8,"Oole, Remeus",,,,1688-10-13T00:00:00Z,1689-01-01T00:00:00Z,,1689-01-01T00:00:00Z,,,...,10361,,Vlissingen,Ardra,Curaçao,,,Ardra,,
9,"Engelsman, Samuel<br/> Dronker, Jan",,,,,1683-06-30T00:00:00Z,,1683-01-17T00:00:00Z,,,...,10362,,Texel,"Bight of Benin, place unspecified","Suriname, place unspecified",,,"Bight of Benin, place unspecified","Suriname, place unspecified",


### Restriction du tableau aux vaisseaux, capitaines et dates de départ avec esclaves

Attention, le fichier de sortie a été modifié par des recherches-remplacements en expressions régulières après cet export, afin de n'avoir qu'un capitaine par ligne (par opposition à `Stoop, Pieter<br/> Crans, Pieter`) :

Regex utilisées pour dupliquer les lignes en ne gardant qu'un des capitaines à chaque ligne obtenue, puis enlever la colonne des identifiants :
1. Search :`^([^,]*,[^,]*,[^,]*,[^,]*,")([^<\n]+)<br/> ([^"<]+)<br/> ([^"<]+)"` - Replace: `\1\2"\n\1\3"\n\1\4"`
2. Search : `^([^,]*,[^,]*,[^,]*,[^,]*,")([^<\n]+)<br/> ([^"<]+)"` - Replace : `\1\2"\n\1\3"`
3. Search : `^\d+,` - Replace : nothing.

In [32]:
# Restriction des colonnes.

sv_summary = slave_voyages_orig[["Flag of vessel", 'Date vessel departed with captives', 'Vessel name', "Captain's name"]]

# Restriction des lignes.

sv_netherlands = sv_summary[sv_summary['Flag of vessel'] == 'Netherlands']

# Export en CSV.

sv_netherlands.to_csv('netherlands.csv')

# Contrôle des 10 premières lignes.

sv_netherlands.head(10)

Unnamed: 0,Flag of vessel,Date vessel departed with captives,Vessel name,Captain's name
0,Netherlands,1732-09-03T00:00:00Z,Waartwijk,"Velde, Daniel ter"
1,Netherlands,1706-09-09T00:00:00Z,Wakende Kraan,"Hoeven, Jan van der<br/> Wenman, Roelof"
2,Netherlands,,Wapen van Amsterdam,
3,Netherlands,,Wapen van Amsterdam,"Scheij, Pieter"
4,Netherlands,1700-12-15T00:00:00Z,Wapen van Holland (a) Hollandia,"Stoop, Pieter<br/> Crans, Pieter"
5,Netherlands,,Wapen van Holland (a) Hollandia,"Crans, Pieter"
6,Netherlands,,Wapen van Holland (a) Hollandia,"Leijm, Lodewijk van der"
7,Netherlands,1686-02-14T00:00:00Z,Wapen van Zierikzee,"Banckert, Joost"
8,Netherlands,1688-10-13T00:00:00Z,Wapen van Zierikzee,"Oole, Remeus"
9,Netherlands,,Welvaren,"Engelsman, Samuel<br/> Dronker, Jan"


### Création d'un tableau résumant la variable "Flag of Vessel"

In [33]:
slave_voyages_orig["Flag of vessel"].value_counts().reset_index().to_csv('ship_flags.csv')
slave_voyages_orig["Flag of vessel"].value_counts().head(10)

Great Britain        11994
Portugal / Brazil    11334
France                4203
U.S.A.                2276
                      2209
Spain / Uruguay       1928
Netherlands           1699
Denmark / Baltic       413
Other                   16
Sweden                   1
Name: Flag of vessel, dtype: int64

### Contrôle des valeurs pour les noms de capitaines

Où l'on voit qu'il y a des doublons possibles mais on ne peut pas être sûrs car ce sont des données anciennes.

In [9]:
for name in np.unique(slave_voyages_orig["Captain's name"]):
    print(name)


(O Procurador)
?, Manoel Jose da
Abarou, Santiago de
Abarroa, Manoel
Abarron, F D<br/> Basagoyti, Felipe
Abautret
Abautret, Jean
Abbis, Thomas
Abborne (a) Abam
Abbot, Richard
Abdiel
Abeille, Joseph<br/> Duranteau, Jacques
Abel, Jean van
Abella, John
Abercrombie, Alexander
Abona, Ignacio
Abraes, Francisco Henriques<br/> Pereira, José Álvares
Abraham, Jean
Abraham, Jean<br/> Berthomé, J
Abraham, Jean<br/> Humphry, Jean-Amaury
Abraham, Ralph
Abraham, Woodward<br/> Calba
Abram, Ralph
Abram, Ralph<br/> Paisley
Abrard, Pedro
Abrego, Agustin de<br/> Martin Lopez
Abreu, Antonio José de
Abreu, Antonio de
Abreu, Antônio José de
Abreu, F F de
Abreu, Francisco Xavier de
Abreu, Francisco de
Abreu, Frutuoso de
Abreu, Joaquim José Pereira de
Abreu, Jose
Abreu, José Alvares
Abreu, José das Neves de
Abreu, José Álvares de
Abreu, José Álvares de<br/> Macedo, Lourenço José de
Abreu, João Dorneles de
Abreu, João Gregório de
Abreu, João Teixeira de
Abreu, L A de
Abreu, Luís
Abreu, Luís da Silva
Abreu, Man

Hendricksz, Hendrick
Hendrix, Arend
Hendy, John<br/> Lewis, John
Henin
Hennchement, Denis
Henneguy, J-Fr
Henneguy, J-Fr (a) Hennequin
Henricheman, David
Henriques de Apostre, Juan<br/> Yrigoyen, Martín de
Henriques, Francisco de Freitas
Henriques, Francisco de Freitas<br/> Cruz, Domingos Luis da
Henriques, Francisco de Freitas<br/> Lima, João Pereira
Henriques, Francisco de Freitas<br/> Moura, Antônio Gonçalves
Henriques, J H
Henriques, José Maria
Henriques, João Militão
Henriques, João do Nascimento
Henriques, Juan<br/> Jeronimo Vendugo, Miguel
Henriques, Manoel
Henriques, Simão Lopes
Henry
Henry, Don
Henry, Juan
Henry, René Marie
Henshall, Philip
Henshall, Phillip
Hensley, Samuel
Hensley, Samuel<br/> Roberts, William
Henty, John
Henty, John<br/> Squerrel, Francis
Hepton, John
Heraud, J-M
Herault, Et
Herault, Louis
Heray
Herbert
Herbert Pradelan
Herbert Pradelan, Fr
Herbert de Pradelan, Fr
Herbert, Fr
Herbert, Thomas
Herblin, Ch
Herblin, Charles
Hercouët
Hercouët, Jean-Gabriel
Hercula

Roche, Nicolas
Roche, Patricio
Rochefort, Bartholomew
Rocher, Jean-Baptiste
Rocher, Thomas
Rochiers, Jan
Rochodel, Lourenço Antônio
Rockliffe, Thomas<br/> McCartney, Thomas
Rockliffe, Thomas<br/> Taylor
Rodden, William
Roder, Francisco
Rodman, Elisha C<br/> Shearman, Benjamin
Rodman, Thom
Rodman, William
Rodon, Robert
Rodovalho Fernandes, Gaspar
Rodres, Francisco
Rodrigo, Bartolomé
Rodrigo, José<br/> Monteiro, Pedro dos Passos
Rodrigo,Sebastián
Rodrigue, Amable
Rodrigue, Cel-Em
Rodrigue, Céleste
Rodrigue, V
Rodrigue<br/> Mallet, Joseph<br/> Audebert
Rodrigues
Rodrigues Barrero, Miguel
Rodrigues de Cuba, Miguel
Rodrigues, A J
Rodrigues, Amaro
Rodrigues, André
Rodrigues, Antonio
Rodrigues, António
Rodrigues, Antônio
Rodrigues, Antônio Feliciano
Rodrigues, Antônio José
Rodrigues, Aurélio
Rodrigues, B
Rodrigues, Bartolome
Rodrigues, Bento
Rodrigues, C J
Rodrigues, Cosme José
Rodrigues, Cosme José<br/> Almeida, José G.
Rodrigues, Cristobal
Rodrigues, Diogo
Rodrigues, Domingo
Rodrigues, Este