### Data cleaning

Este notebook sirve para limpiar la base de datos extraida y almcenada con título `scraped_data.csv`. Aquí se llevarán a cabo métodos de inspección del dataframe para comprobar que los datatypes sean correctos y que la información contenida sea útil y legible en otras herramientas como pueden ser SQL y PowerBI.

In [1]:
import pandas as pd
import numpy as np

pd.set_option('display.max_columns', None)
pd.set_option('display.max_rows', None)

In [2]:
df = pd.read_csv('../data/scraped_data.csv')
df.head()

Unnamed: 0,date,lat_and_long,GTOA_Protocol,Boat_Type,Boat_Length,Towing_Inflatable,Trailing_Fishing_Lure,Physical_Contact_With_Boat,Number_of_Adult_Orcas,Number_of_Juvenile_Orcas,Number_of_Uncertain_Age_Orcas,Rudder,Motoring_or_Sailing,Speed_Knots,Sea_State,Wind_Speed_Beaufort,Daylight_or_Darkness,Cloud_Cover,Distance_Off_Land_NM,Depth_Meters,Depth_Gauge,Autopilot,Hull_Topsides_Color,Antifoul_Color,Boat_Damaged,Tow_Required,Crew_Response,Orcas_Behaviour
0,2023-11-01 22:15:00,"32 47.4980 N, 9 54.3980 W","No, did not follow protocol, interaction laste...",Sail,10 - 12.5m,Not towing,No,No,1,0,0,Spade,Sailing,5 - 7,Moderate,5 - 6 (17 - 27 knots),Night,0 - 25%,Over 10,200m+,On,On,White/light,Blue,No,No,"Orca interaction at 10:15pm on 01/11, 40 miles...",I would describe the behaviour of the Orca dur...
1,2023-10-31 07:50:00,"39 26.0000 N, 9 23.0000 W","Yes, followed protocol, interaction lasted les...",Sail,12.5 - 15m,Not towing,No,Yes,2,5,0,Twin rudder,Motoring,5 - 7,Rough,3 - 4 (7 - 16 knots),Day,50 - 75%,2 - 5,40 - 200m,On,On,White/light,Black,"Yes, moderate - immediate repairs required",No,We had sandbags on our sugar scoops and metal ...,Juveniles hitting the rudders adults close by
2,2023-09-19 11:00:00,"37 40.0000 N, 8 54.0000 W","No, did not follow protocol, interaction laste...",Sail,12.5 - 15m,Not towing,No,Yes,1,0,0,Spade,Motoring,5 - 7,Calm,3 - 4 (7 - 16 knots),Day,0 - 25%,2 - 5,40 - 200m,On,On,White/light,Other,"Yes, moderate - immediate repairs required",No,We saw the orca approach from 10 o’clock posit...,There was an initial approach 45 minutes earli...
3,2023-09-01 13:15:00,"45 36.0000 N, 3 45.0000 W","Yes, followed protocol, interaction lasted 10 ...",Sail,Over 15m,Not towing,Yes,Yes,1,2,0,Spade,Sailing,3 - 4,Calm,3 - 4 (7 - 16 knots),Day,25 - 50%,Over 10,200m+,Off,Off,White/light,Black,"Yes, moderate - immediate repairs required",No,Les trois orques passent constamment de bâbord...,Pas de comportement visblement agressif./// No...
4,2023-09-02 03:45:00,"42 45.0000 N, 9 14.0000 W","Yes, followed protocol, interaction lasted les...",Sail,12.5 - 15m,Not towing,No,Yes,1,2,0,Spade,Motorsailing,5 - 7,Calm,0 - 2 (0 - 6 knots),Night,0 - 25%,5 - 10,40 - 200m,On,On,White/light,Black,"Yes, moderate - immediate repairs required",Yes,Arrêt du pilote automatique a la 2 eme interac...,Approche furtive à la première interaction dir...


In [3]:
df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 154 entries, 0 to 153
Data columns (total 28 columns):
 #   Column                         Non-Null Count  Dtype 
---  ------                         --------------  ----- 
 0   date                           154 non-null    object
 1   lat_and_long                   154 non-null    object
 2   GTOA_Protocol                  154 non-null    object
 3   Boat_Type                      154 non-null    object
 4   Boat_Length                    154 non-null    object
 5   Towing_Inflatable              154 non-null    object
 6   Trailing_Fishing_Lure          154 non-null    object
 7   Physical_Contact_With_Boat     154 non-null    object
 8   Number_of_Adult_Orcas          154 non-null    object
 9   Number_of_Juvenile_Orcas       154 non-null    object
 10  Number_of_Uncertain_Age_Orcas  154 non-null    object
 11  Rudder                         154 non-null    object
 12  Motoring_or_Sailing            154 non-null    object
 13  Speed

Tenemos 28 columnas, todas de tipo objeto por como se ha escrapeado la información en el notebook `1.WebScraping`. Entre otras cosas se va a hacer lo siguiente:

* Borrar aquellas filas que no contengan información o que estén repetidas
* Corregir cualquier fila de datos que pueda estar en columnas equivocadas
* Separar datos en dos columnas si fuera necesario, como por ejemplo con latitud y longitud que actualmente se encuentra en una sola columna
* Cambiar los datatypes de las columnas. Actualmente todas las columnas son de tipo objeto/string.
* Quitar duplicados en caso de que los hubiera
* La información contenida en el dataframe viene de un formulario que llevaron a cabo los patrones de las embarcaciones que sufrieron una interacción con una orca constaban de varias opciones (y no respuesta libre), será util convertir las variables categóricas en numéricas, vía aplicación de One-Hot encoding. Esto supondrá un paso clave de cara a aplicar métodos de modelos predictivos en el futuro.
* Identificar y rechazar outliers en caso de que los hubiera y siempre razonando si interesa deshacernos de ellos o nos aportan alguna información útil.
* Comprobación de consistencia de la BBDD.

#### 1. Descripción de las columnas

A continuación se incluye una descripción de las columnas con las que cuenta originalmente nuestra base de datos:

* **date**: Fecha y hora aproximada de la interacción
* **lat_and_long**: Latitud y longitud donde se dio la interacción
* **GTOA_Protocol**: Se siguió el *protocolo* de GT Orca Atlántica: Arriar las velas, detener la embarcación, apagar el motor y mantener un perfil bajo así como *duración* de la interacción
* **Boat_Type**: Tipo de barco - Velero | Motor | Barco de pesca
* **Boat_Length**: Eslora del barco (m) - menos de 10m | 10-12.5 | 12.5-15m | Más de 15m
* **Towing_Inflatable**: Se encontraba el barco remolcando una lancha neumática?
* **Trailing_Fishing_Lure**: Se encontraba el barco arrastrando un señuelo de pesca?
* **Physical_Contact_With_Boat**: Hubo contacto físico de las orcas con la embarcación?
* **Number_of_Adult_Orcas**: Número de orcas adultas?
* **Number_of_Juvenile_Orcas**: Número de orcas juveniles?
* **Number_of_Uncertain_Age_Orcas**: Número de orcas de edad incierta?
* **Rudder**: Tipo de timón - Spade/pala | Semi skeg/Semicompensado | Full skeg/Completo | Twin rudder/Doble timón | Keel hung/Quilla corrida
* **Motoring_or_Sailing**: Motor o vela - Vela | Motor | Motor/Vela | Hove-to
* **Speed_Knots**: Velocidad (kts)
* **Sea_State**: Estado del mar - Calma | Moderado | Gruesa
* **Wind_Speed_Beaufort**: Velocidad del viento (Escala de Beaufort) - 0.2 | 3-4 | 5-6 | 7+
* **Daylight_or_Darkness**: Noche/Día - Amanecer | Día | Atardecer | Noche
* **Cloud_Cover**: Cobertura de nubes - 0-25% | 25-50% | 50-75% | 75-100%
* **Distance_Off_Land_NM**: Distancia a tierra (nm) - 0-2 | 2-5 | 5-10 | Más de 10
* **Depth_Meters**: Profundidad (m) - hasta 20m | 20-40m | 40-200m | 200m+
* **Depth_Gauge**: Medidor de profundidad - On | Off
* **Autopilot**: Piloto automático - On | Off
* **Hull_Topsides_Color**: Color del casco - Blanco | Oscuro
* **Antifoul_Color**: Color del antifoul - Negro | Azul | Rojo | Blanco | Verde | Coppercoat | Otro
* **Boat_Damaged**: Fue dañado el barco o necesita reparación alguna? Sí, menor | Sí, moderado | Sí, grandes reparaciones | No
* **Tow_Required**: ¿Fue remolcado? - Yes | No
* **Crew_Response**: Descripción abierta de la interacción así como acciones que se tomaron y su estas  disuadieron o no la interacción con las orcas.
* **Orcas_Behaviour**: Descripción del comportamiento de la/s orca/s

#### Fila 105
* Cambio en la fila con indice 105 por error de dimensionamiento - Vamos a mover una celda a la derecha todos los valores desde la columna 'Rudder' en adelante:

In [4]:
print(df.iloc[105])

date                                                           2022-04-13 15:15:00
lat_and_long                                              35 52.0000 N, 6 1.1000 W
GTOA_Protocol                                                                 Sail
Boat_Type                                                               12.5 - 15m
Boat_Length                                                             Not towing
Towing_Inflatable                                                            Spade
Trailing_Fishing_Lure                                                      Unknown
Physical_Contact_With_Boat                                                 Unknown
Number_of_Adult_Orcas                                                            0
Number_of_Juvenile_Orcas                                                         0
Number_of_Uncertain_Age_Orcas                                                    0
Rudder                                                                     Sailing
Moto

In [5]:
index_to_shift = 105

# Cogemos el indice de la columna 'Rudder'
rudder_column_index = df.columns.get_loc('Rudder')

# Usamos la función shift para la fila en particular y desde 'Rudder' en adelante
df.iloc[index_to_shift, rudder_column_index:] = df.iloc[index_to_shift, rudder_column_index:].shift(1)

print(df.iloc[index_to_shift])

date                                                           2022-04-13 15:15:00
lat_and_long                                              35 52.0000 N, 6 1.1000 W
GTOA_Protocol                                                                 Sail
Boat_Type                                                               12.5 - 15m
Boat_Length                                                             Not towing
Towing_Inflatable                                                            Spade
Trailing_Fishing_Lure                                                      Unknown
Physical_Contact_With_Boat                                                 Unknown
Number_of_Adult_Orcas                                                            0
Number_of_Juvenile_Orcas                                                         0
Number_of_Uncertain_Age_Orcas                                                    0
Rudder                                                                        None
Moto

Ahora voy a cambiar a mano:
* GTOA_Protocol: 'Unknown'
* Boat_Type: 'Sail'
* Boat_Length: '12.5 - 15m'
* Towing_Inflatable: 'Not towing'
* Rudder: 'Spade'

In [6]:
# Voy a usar .loc[] para hacer los cambios
df.loc[105, 'GTOA_Protocol'] = 'Unknown'
df.loc[105, 'Boat_Type'] = 'Sail'
df.loc[105, 'Boat_Length'] = '12.5 - 15m'
df.loc[105, 'Towing_Inflatable'] = 'Not towing'
df.loc[105, 'Rudder'] = 'Spade'

print(df.iloc[105])

date                                                           2022-04-13 15:15:00
lat_and_long                                              35 52.0000 N, 6 1.1000 W
GTOA_Protocol                                                              Unknown
Boat_Type                                                                     Sail
Boat_Length                                                             12.5 - 15m
Towing_Inflatable                                                       Not towing
Trailing_Fishing_Lure                                                      Unknown
Physical_Contact_With_Boat                                                 Unknown
Number_of_Adult_Orcas                                                            0
Number_of_Juvenile_Orcas                                                         0
Number_of_Uncertain_Age_Orcas                                                    0
Rudder                                                                       Spade
Moto

#### Fila 153
* Vamos a borrar esta fila por contener todo 'Unknown'

In [7]:
print(df.iloc[153])

date                             Unknown
lat_and_long                     Unknown
GTOA_Protocol                    Unknown
Boat_Type                        Unknown
Boat_Length                      Unknown
Towing_Inflatable                Unknown
Trailing_Fishing_Lure            Unknown
Physical_Contact_With_Boat       Unknown
Number_of_Adult_Orcas            Unknown
Number_of_Juvenile_Orcas         Unknown
Number_of_Uncertain_Age_Orcas    Unknown
Rudder                           Unknown
Motoring_or_Sailing              Unknown
Speed_Knots                      Unknown
Sea_State                        Unknown
Wind_Speed_Beaufort              Unknown
Daylight_or_Darkness             Unknown
Cloud_Cover                      Unknown
Distance_Off_Land_NM             Unknown
Depth_Meters                     Unknown
Depth_Gauge                      Unknown
Autopilot                        Unknown
Hull_Topsides_Color              Unknown
Antifoul_Color                   Unknown
Boat_Damaged    

In [8]:
df.drop(index=153, inplace=True)
df.tail()

Unnamed: 0,date,lat_and_long,GTOA_Protocol,Boat_Type,Boat_Length,Towing_Inflatable,Trailing_Fishing_Lure,Physical_Contact_With_Boat,Number_of_Adult_Orcas,Number_of_Juvenile_Orcas,Number_of_Uncertain_Age_Orcas,Rudder,Motoring_or_Sailing,Speed_Knots,Sea_State,Wind_Speed_Beaufort,Daylight_or_Darkness,Cloud_Cover,Distance_Off_Land_NM,Depth_Meters,Depth_Gauge,Autopilot,Hull_Topsides_Color,Antifoul_Color,Boat_Damaged,Tow_Required,Crew_Response,Orcas_Behaviour
148,2022-07-06 07:15:00,"35 59.0000 N, 5 55.0000 W","No, did not follow protocol, interaction laste...",Sail,12.5 - 15m,Not towing,Unknown,Unknown,0,0,0,Twin rudder,Motorsailing,5 - 7,Calm,3 - 4 (7 - 16 knots),Day,0 - 25%,5 - 10,40 - 200m,On,On,White/light,Black,No,No,We were following another sail boat from Porti...,"They appeared from my starboard, went under th..."
149,2022-07-03 18:30:00,"39 24.6240 N, 9 38.4150 W","Yes, followed protocol, interaction lasted 10 ...",Sail,10 - 12.5m,Not towing,Unknown,Unknown,0,0,0,Twin rudder,Sailing,5 - 7,Moderate,3 - 4 (7 - 16 knots),Day,50 - 75%,Over 10,200m+,On,On,White/light,Black,"Yes, extensive - major works required",No,Nous naviguions à la voile avec un catamaran d...,"Relativement calme, allant sous le bateau, don..."
150,2022-07-03 17:50:00,"38 7.2100 N, 9 1.4300 W","Yes, followed protocol, interaction lasted les...",Sail,Under 10m,Not towing,Unknown,Unknown,0,0,0,Spade,Motorsailing,5 - 7,Calm,5 - 6 (17 - 27 knots),Day,25 - 50%,5 - 10,40 - 200m,On,On,White/light,Blue,No,No,I only saw 2 adult females and 2 juveniles. My...,Very placid and almost lethargic. I’ve never s...
151,2022-07-03 13:00:00,"39 21.0000 N, 9 35.0000 W","No, did not follow protocol, interaction laste...",Sail,Over 15m,Not towing,Unknown,Unknown,0,0,0,Twin rudder,Motorsailing,5 - 7,Moderate,7+ (28 knots+),Day,25 - 50%,5 - 10,40 - 200m,On,Off,White/light,Black,"Yes, extensive - major works required",No,The boat is a catamaran with 2 engines and rud...,There were 3 adults and 2 juveniles. The small...
152,2022-06-24 12:50:00,"36 50.0000 N, 8 55.0000 W","No, did not follow protocol, interaction laste...",Sail,10 - 12.5m,Not towing,Unknown,Unknown,0,0,0,Spade,Sailing,5 - 7,Calm,3 - 4 (7 - 16 knots),Day,0 - 25%,Over 10,200m+,On,On,White/light,Blue,No,No,The two orcas were nearby a buoy and as soon a...,The two orcas (one of 6 meters and other of 3-...


## Comprobación columna a columna
* Vamos a comprobar los valores unicos de cada columna para detectar valores que no cuadren

#### GTOA_Protocol

In [9]:
df.GTOA_Protocol.unique()

array(['No, did not follow protocol, interaction lasted less than 10 minutes',
       'Yes, followed protocol, interaction lasted less than 10 minutes',
       'Yes, followed protocol, interaction lasted 10 to 30 minutes',
       'Yes, followed protocol, interaction lasted 30 to 60 minutes',
       'No, did not follow protocol, interaction lasted 10 to 30 minutes',
       'Yes, followed protocol, interaction lasted more than 60 minutes',
       'No, did not follow protocol, interaction lasted more than 60 minutes',
       'No, did not follow protocol, interaction lasted 30 to 60 minutes',
       'Unknown'], dtype=object)

Dado que esta columna contiene estos valores únicos, podemos facilmente definir una función que nos separe en dos columnas de mayor utilidad. Por un lado nos están diciendo si se siguió o no el protocolo y por otro cuanto tiempo duró la interacción. Separaremos esta columna en las siguientes: 'Followed_GTOA_Protocol' e 'Interaction_time'.

Vamos a aprovechar como están construidas las respuestas para separar en cada una de las nuevas columnas:
* 'Followed_GTOA_Protocol': A partir de la primera palabra podemos saber si se siguió (``Yes``) si no (``No``) o si no se conoce la respuesta del patrón, desconocido (``Unknown``)

* 'Interaction_time': A partir de las últimas 4 palabras podemos clasificar en 5 rangos de tiempo:
1) less than 10 minutes: 0-10
2) 10 to 30 minutes: 10-30
3) 30 to 30 minutes: 30-60
4) more than 60 minutes: 60+
5) Unknown

In [10]:
# Definimos una función que nos categorice:
def saca_protocolo_tiempo(value):
    
    # Empezamos categorizando según se siguiera el protocolo o no
    if value.startswith('Yes'):
        protocolo = 'Yes'
    elif value.startswith('No'):
        protocolo = 'No'
    else:
        protocolo = 'Unknown'

    # Seguimos categorizando por duración de la interacción
    if 'less than 10 minutes' in value:
        interaccion = '0-10'
    elif '10 to 30 minutes' in value:
        interaccion = '10-30'
    elif '30 to 60 minutes' in value:
        interaccion = '30-60'
    elif 'more than 60 minutes' in value:
        interaccion = '60+'
    else:
        interaccion = 'Unknown'

    return pd.Series([protocolo, interaccion], index = ['Followed_GTOA_Protocol', 'Interaction_time'])

In [11]:
# Hago la llamada a la función con un apply() y creo las dos nuevas columnas:
df[['Followed_GTOA_Protocol', 'Interaction_time']] = df.GTOA_Protocol.apply(saca_protocolo_tiempo)
df.head()

Unnamed: 0,date,lat_and_long,GTOA_Protocol,Boat_Type,Boat_Length,Towing_Inflatable,Trailing_Fishing_Lure,Physical_Contact_With_Boat,Number_of_Adult_Orcas,Number_of_Juvenile_Orcas,Number_of_Uncertain_Age_Orcas,Rudder,Motoring_or_Sailing,Speed_Knots,Sea_State,Wind_Speed_Beaufort,Daylight_or_Darkness,Cloud_Cover,Distance_Off_Land_NM,Depth_Meters,Depth_Gauge,Autopilot,Hull_Topsides_Color,Antifoul_Color,Boat_Damaged,Tow_Required,Crew_Response,Orcas_Behaviour,Followed_GTOA_Protocol,Interaction_time
0,2023-11-01 22:15:00,"32 47.4980 N, 9 54.3980 W","No, did not follow protocol, interaction laste...",Sail,10 - 12.5m,Not towing,No,No,1,0,0,Spade,Sailing,5 - 7,Moderate,5 - 6 (17 - 27 knots),Night,0 - 25%,Over 10,200m+,On,On,White/light,Blue,No,No,"Orca interaction at 10:15pm on 01/11, 40 miles...",I would describe the behaviour of the Orca dur...,No,0-10
1,2023-10-31 07:50:00,"39 26.0000 N, 9 23.0000 W","Yes, followed protocol, interaction lasted les...",Sail,12.5 - 15m,Not towing,No,Yes,2,5,0,Twin rudder,Motoring,5 - 7,Rough,3 - 4 (7 - 16 knots),Day,50 - 75%,2 - 5,40 - 200m,On,On,White/light,Black,"Yes, moderate - immediate repairs required",No,We had sandbags on our sugar scoops and metal ...,Juveniles hitting the rudders adults close by,Yes,0-10
2,2023-09-19 11:00:00,"37 40.0000 N, 8 54.0000 W","No, did not follow protocol, interaction laste...",Sail,12.5 - 15m,Not towing,No,Yes,1,0,0,Spade,Motoring,5 - 7,Calm,3 - 4 (7 - 16 knots),Day,0 - 25%,2 - 5,40 - 200m,On,On,White/light,Other,"Yes, moderate - immediate repairs required",No,We saw the orca approach from 10 o’clock posit...,There was an initial approach 45 minutes earli...,No,0-10
3,2023-09-01 13:15:00,"45 36.0000 N, 3 45.0000 W","Yes, followed protocol, interaction lasted 10 ...",Sail,Over 15m,Not towing,Yes,Yes,1,2,0,Spade,Sailing,3 - 4,Calm,3 - 4 (7 - 16 knots),Day,25 - 50%,Over 10,200m+,Off,Off,White/light,Black,"Yes, moderate - immediate repairs required",No,Les trois orques passent constamment de bâbord...,Pas de comportement visblement agressif./// No...,Yes,10-30
4,2023-09-02 03:45:00,"42 45.0000 N, 9 14.0000 W","Yes, followed protocol, interaction lasted les...",Sail,12.5 - 15m,Not towing,No,Yes,1,2,0,Spade,Motorsailing,5 - 7,Calm,0 - 2 (0 - 6 knots),Night,0 - 25%,5 - 10,40 - 200m,On,On,White/light,Black,"Yes, moderate - immediate repairs required",Yes,Arrêt du pilote automatique a la 2 eme interac...,Approche furtive à la première interaction dir...,Yes,0-10


In [12]:
df.GTOA_Protocol.value_counts()

GTOA_Protocol
No, did not follow protocol, interaction lasted less than 10 minutes    40
Yes, followed protocol, interaction lasted 10 to 30 minutes             26
Yes, followed protocol, interaction lasted less than 10 minutes         24
No, did not follow protocol, interaction lasted 10 to 30 minutes        24
Yes, followed protocol, interaction lasted 30 to 60 minutes             15
Yes, followed protocol, interaction lasted more than 60 minutes         11
No, did not follow protocol, interaction lasted 30 to 60 minutes         9
No, did not follow protocol, interaction lasted more than 60 minutes     3
Unknown                                                                  1
Name: count, dtype: int64

In [13]:
df.Followed_GTOA_Protocol.value_counts()

Followed_GTOA_Protocol
No         76
Yes        76
Unknown     1
Name: count, dtype: int64

In [14]:
df.Interaction_time.value_counts()

Interaction_time
0-10       64
10-30      50
30-60      24
60+        14
Unknown     1
Name: count, dtype: int64

In [15]:
# Una vez comprobado que se han separado bien los datos via el value_counts(), reordenamos columnas y nos deshacemos de la columna de partida GTOA_Protocol

df.drop(columns='GTOA_Protocol', inplace=True)

column_order = ['date', 'lat_and_long', 'Followed_GTOA_Protocol', 'Interaction_time', 'Boat_Type', 'Boat_Length',
                'Towing_Inflatable', 'Trailing_Fishing_Lure', 'Physical_Contact_With_Boat', 'Number_of_Adult_Orcas',
                'Number_of_Juvenile_Orcas', 'Number_of_Uncertain_Age_Orcas', 'Rudder', 'Motoring_or_Sailing',
                'Speed_Knots', 'Sea_State', 'Wind_Speed_Beaufort', 'Daylight_or_Darkness', 'Cloud_Cover',
                'Distance_Off_Land_NM', 'Depth_Meters', 'Depth_Gauge', 'Autopilot', 'Hull_Topsides_Color',
                'Antifoul_Color', 'Boat_Damaged', 'Tow_Required', 'Crew_Response', 'Orcas_Behaviour']

df = df[column_order]
df.head()

Unnamed: 0,date,lat_and_long,Followed_GTOA_Protocol,Interaction_time,Boat_Type,Boat_Length,Towing_Inflatable,Trailing_Fishing_Lure,Physical_Contact_With_Boat,Number_of_Adult_Orcas,Number_of_Juvenile_Orcas,Number_of_Uncertain_Age_Orcas,Rudder,Motoring_or_Sailing,Speed_Knots,Sea_State,Wind_Speed_Beaufort,Daylight_or_Darkness,Cloud_Cover,Distance_Off_Land_NM,Depth_Meters,Depth_Gauge,Autopilot,Hull_Topsides_Color,Antifoul_Color,Boat_Damaged,Tow_Required,Crew_Response,Orcas_Behaviour
0,2023-11-01 22:15:00,"32 47.4980 N, 9 54.3980 W",No,0-10,Sail,10 - 12.5m,Not towing,No,No,1,0,0,Spade,Sailing,5 - 7,Moderate,5 - 6 (17 - 27 knots),Night,0 - 25%,Over 10,200m+,On,On,White/light,Blue,No,No,"Orca interaction at 10:15pm on 01/11, 40 miles...",I would describe the behaviour of the Orca dur...
1,2023-10-31 07:50:00,"39 26.0000 N, 9 23.0000 W",Yes,0-10,Sail,12.5 - 15m,Not towing,No,Yes,2,5,0,Twin rudder,Motoring,5 - 7,Rough,3 - 4 (7 - 16 knots),Day,50 - 75%,2 - 5,40 - 200m,On,On,White/light,Black,"Yes, moderate - immediate repairs required",No,We had sandbags on our sugar scoops and metal ...,Juveniles hitting the rudders adults close by
2,2023-09-19 11:00:00,"37 40.0000 N, 8 54.0000 W",No,0-10,Sail,12.5 - 15m,Not towing,No,Yes,1,0,0,Spade,Motoring,5 - 7,Calm,3 - 4 (7 - 16 knots),Day,0 - 25%,2 - 5,40 - 200m,On,On,White/light,Other,"Yes, moderate - immediate repairs required",No,We saw the orca approach from 10 o’clock posit...,There was an initial approach 45 minutes earli...
3,2023-09-01 13:15:00,"45 36.0000 N, 3 45.0000 W",Yes,10-30,Sail,Over 15m,Not towing,Yes,Yes,1,2,0,Spade,Sailing,3 - 4,Calm,3 - 4 (7 - 16 knots),Day,25 - 50%,Over 10,200m+,Off,Off,White/light,Black,"Yes, moderate - immediate repairs required",No,Les trois orques passent constamment de bâbord...,Pas de comportement visblement agressif./// No...
4,2023-09-02 03:45:00,"42 45.0000 N, 9 14.0000 W",Yes,0-10,Sail,12.5 - 15m,Not towing,No,Yes,1,2,0,Spade,Motorsailing,5 - 7,Calm,0 - 2 (0 - 6 knots),Night,0 - 25%,5 - 10,40 - 200m,On,On,White/light,Black,"Yes, moderate - immediate repairs required",Yes,Arrêt du pilote automatique a la 2 eme interac...,Approche furtive à la première interaction dir...


#### Boat Type

In [16]:
df.Boat_Type.unique()

array(['Sail', 'Motor', 'Fishing Vessel'], dtype=object)

In [17]:
df.Boat_Type.value_counts()

Boat_Type
Sail              150
Motor               2
Fishing Vessel      1
Name: count, dtype: int64

* Parece que no hay valores nulos y que la gran mayoría de los barcos que tuvieron interacciones con orcas eran veleros.

#### Boat length

In [18]:
df.Boat_Length.unique()

array(['10 - 12.5m', '12.5 - 15m', 'Over 15m', 'Under 10m'], dtype=object)

In [19]:
df.Boat_Length.value_counts()

Boat_Length
10 - 12.5m    59
12.5 - 15m    56
Over 15m      30
Under 10m      8
Name: count, dtype: int64

Vamos a cambiar el formato de los rangos a los siguientes:

* Under 10m -> 0-10
* 10 - 12.5m -> 10-12.5
* 12.5 - 15m -> 12.5-15
* Over 15m -> 15+


In [20]:
# Definimos una función de cambio:
def clean_length(value):

    if value == 'Under 10m':
        return '0-10'
    elif value == '10 - 12.5m':
        return '10-12.5'
    elif value == '12.5 - 15m':
        return '12.5-15m'
    elif value == 'Over 15m':
        return '15+'
    else:
        return 'Unknown'

In [21]:
# Hacemos la llamada a la función con un apply() para aplicárselo a toda la columna
df.Boat_Length = df.Boat_Length.apply(clean_length)
df.head()

Unnamed: 0,date,lat_and_long,Followed_GTOA_Protocol,Interaction_time,Boat_Type,Boat_Length,Towing_Inflatable,Trailing_Fishing_Lure,Physical_Contact_With_Boat,Number_of_Adult_Orcas,Number_of_Juvenile_Orcas,Number_of_Uncertain_Age_Orcas,Rudder,Motoring_or_Sailing,Speed_Knots,Sea_State,Wind_Speed_Beaufort,Daylight_or_Darkness,Cloud_Cover,Distance_Off_Land_NM,Depth_Meters,Depth_Gauge,Autopilot,Hull_Topsides_Color,Antifoul_Color,Boat_Damaged,Tow_Required,Crew_Response,Orcas_Behaviour
0,2023-11-01 22:15:00,"32 47.4980 N, 9 54.3980 W",No,0-10,Sail,10-12.5,Not towing,No,No,1,0,0,Spade,Sailing,5 - 7,Moderate,5 - 6 (17 - 27 knots),Night,0 - 25%,Over 10,200m+,On,On,White/light,Blue,No,No,"Orca interaction at 10:15pm on 01/11, 40 miles...",I would describe the behaviour of the Orca dur...
1,2023-10-31 07:50:00,"39 26.0000 N, 9 23.0000 W",Yes,0-10,Sail,12.5-15m,Not towing,No,Yes,2,5,0,Twin rudder,Motoring,5 - 7,Rough,3 - 4 (7 - 16 knots),Day,50 - 75%,2 - 5,40 - 200m,On,On,White/light,Black,"Yes, moderate - immediate repairs required",No,We had sandbags on our sugar scoops and metal ...,Juveniles hitting the rudders adults close by
2,2023-09-19 11:00:00,"37 40.0000 N, 8 54.0000 W",No,0-10,Sail,12.5-15m,Not towing,No,Yes,1,0,0,Spade,Motoring,5 - 7,Calm,3 - 4 (7 - 16 knots),Day,0 - 25%,2 - 5,40 - 200m,On,On,White/light,Other,"Yes, moderate - immediate repairs required",No,We saw the orca approach from 10 o’clock posit...,There was an initial approach 45 minutes earli...
3,2023-09-01 13:15:00,"45 36.0000 N, 3 45.0000 W",Yes,10-30,Sail,15+,Not towing,Yes,Yes,1,2,0,Spade,Sailing,3 - 4,Calm,3 - 4 (7 - 16 knots),Day,25 - 50%,Over 10,200m+,Off,Off,White/light,Black,"Yes, moderate - immediate repairs required",No,Les trois orques passent constamment de bâbord...,Pas de comportement visblement agressif./// No...
4,2023-09-02 03:45:00,"42 45.0000 N, 9 14.0000 W",Yes,0-10,Sail,12.5-15m,Not towing,No,Yes,1,2,0,Spade,Motorsailing,5 - 7,Calm,0 - 2 (0 - 6 knots),Night,0 - 25%,5 - 10,40 - 200m,On,On,White/light,Black,"Yes, moderate - immediate repairs required",Yes,Arrêt du pilote automatique a la 2 eme interac...,Approche furtive à la première interaction dir...


In [22]:
df.Boat_Length.value_counts()

Boat_Length
10-12.5     59
12.5-15m    56
15+         30
0-10         8
Name: count, dtype: int64

In [23]:
# Queda comprobado que se ha hecho bien el cambio, pasamos a la siguiente columna

#### Towing Inflatable

In [24]:
df.Towing_Inflatable.unique()

array(['Not towing', 'Towing and interacted with inflatable first',
       'Unknown', 'Spade'], dtype=object)

In [25]:
df.Towing_Inflatable.value_counts()

Towing_Inflatable
Not towing                                     148
Unknown                                          3
Towing and interacted with inflatable first      1
Spade                                            1
Name: count, dtype: int64

In [26]:
# Vamos a localizar la fila donde está 'Spade' porque es un valor que no debería estar ahí:
df[df.Towing_Inflatable == 'Spade']

Unnamed: 0,date,lat_and_long,Followed_GTOA_Protocol,Interaction_time,Boat_Type,Boat_Length,Towing_Inflatable,Trailing_Fishing_Lure,Physical_Contact_With_Boat,Number_of_Adult_Orcas,Number_of_Juvenile_Orcas,Number_of_Uncertain_Age_Orcas,Rudder,Motoring_or_Sailing,Speed_Knots,Sea_State,Wind_Speed_Beaufort,Daylight_or_Darkness,Cloud_Cover,Distance_Off_Land_NM,Depth_Meters,Depth_Gauge,Autopilot,Hull_Topsides_Color,Antifoul_Color,Boat_Damaged,Tow_Required,Crew_Response,Orcas_Behaviour
144,2022-06-04 08:00:00,"36 4.4000 N, 5 59.9000 W",Yes,0-10,Sail,12.5-15m,Spade,Unknown,Unknown,0,0,0,Motoring,3 - 4,0 - 2 (0 - 6 knots),Day,5 - 10,40 - 200m,On,On,White/light,Black,"Yes, minor - will wait until the end of the se...",No,This report was obtained by GTOA. I stopped th...,This report was obtained by GTOA. It was only ...,daytime,waxing\n20% illuminated,Within 3 days of spring tide


Hay un pequeño desorden en la fila con indice 144 que vamos a arreglar a mano porque no hay un claro patrón. Lo vamos a hacer en los siguientes pasos:

1) Desde el valor que está en la columna *Wind_Speed_Beaufort* en adelante vamos a mover todos los valores 3 posiciones a la derecha
2) El valor correspondiente a *Cloud_Cover* se va a cambiar a Unknown
3) El valor correspondiente a *Daylight_or_Darkness* se va a cambiar a Day
4) El valor correspondiente a *Wind_Speed_Beaufort* se va a cambiar a 0 - 2 (0 - 6 knots)
5) El valor correspondiente a *Sea_State* se va a cambiar a Unknown
6) El valor correspondiente a *Speed_Knots* se va a cambiar a 3 - 4
7) El valor correspondiente a *Motoring_or_Sailing* se va a cambiar a Motoring
8) El valor correspondiente a *Rudder* se va a cambiar a Spade
9) El valor correspondiente a *Towing_Inflatable* se va a cambiar a Unknown


In [27]:
# 1) Desde el valor que está en la columna *Wind_Speed_Beaufort* en adelante vamos a mover todos los valores 3 posiciones a la derecha

# Indice de la fila que queremos alterar
index_to_shift = 144

# Cogemos el indice de la columna 'Wind_Speed_Beaufort'
Beaufort_column_index = df.columns.get_loc('Wind_Speed_Beaufort')

# Usamos la función shift para la fila en particular y desde 'Wind_Speed_Beaufort' en adelante
df.iloc[index_to_shift, Beaufort_column_index:] = df.iloc[index_to_shift, Beaufort_column_index:].shift(3)

print(df.iloc[index_to_shift])

date                                                           2022-06-04 08:00:00
lat_and_long                                              36 4.4000 N, 5 59.9000 W
Followed_GTOA_Protocol                                                         Yes
Interaction_time                                                              0-10
Boat_Type                                                                     Sail
Boat_Length                                                               12.5-15m
Towing_Inflatable                                                            Spade
Trailing_Fishing_Lure                                                      Unknown
Physical_Contact_With_Boat                                                 Unknown
Number_of_Adult_Orcas                                                            0
Number_of_Juvenile_Orcas                                                         0
Number_of_Uncertain_Age_Orcas                                                    0
Rudd

In [28]:
# Voy a usar .loc[] para hacer los cambios
# 2) El valor correspondiente a *Cloud_Cover* se va a cambiar a Unknown

df.loc[144, 'Cloud_Cover'] = 'Unknown'

# 3) El valor correspondiente a *Daylight_or_Darkness* se va a cambiar a Day 

df.loc[144, 'Daylight_or_Darkness'] = 'Day'

# 4) El valor correspondiente a *Wind_Speed_Beaufort* se va a cambiar a 0 - 2 (0 - 6 knots)

df.loc[144, 'Wind_Speed_Beaufort'] = '0 - 2 (0 - 6 knots)'

# 5) El valor correspondiente a *Sea_State* se va a cambiar a Unknown

df.loc[144, 'Sea_State'] = 'Unknown'

# 6) El valor correspondiente a *Speed_Knots* se va a cambiar a 3 - 4

df.loc[144, 'Speed_Knots'] = '3 - 4'

# 7) El valor correspondiente a *Motoring_or_Sailing* se va a cambiar a Motoring

df.loc[144, 'Motoring_or_Sailing'] = 'Motoring'

# 8) El valor correspondiente a *Rudder* se va a cambiar a Spade

df.loc[144, 'Rudder'] = 'Spade'

# 9) El valor correspondiente a *Towing_Inflatable* se va a cambiar a Unknown

df.loc[144, 'Towing_Inflatable'] = 'Not towing'


print(df.iloc[144])

date                                                           2022-06-04 08:00:00
lat_and_long                                              36 4.4000 N, 5 59.9000 W
Followed_GTOA_Protocol                                                         Yes
Interaction_time                                                              0-10
Boat_Type                                                                     Sail
Boat_Length                                                               12.5-15m
Towing_Inflatable                                                       Not towing
Trailing_Fishing_Lure                                                      Unknown
Physical_Contact_With_Boat                                                 Unknown
Number_of_Adult_Orcas                                                            0
Number_of_Juvenile_Orcas                                                         0
Number_of_Uncertain_Age_Orcas                                                    0
Rudd

In [29]:
# Volvemos a ejecutar el value_counts()
df.Towing_Inflatable.value_counts()

Towing_Inflatable
Not towing                                     149
Unknown                                          3
Towing and interacted with inflatable first      1
Name: count, dtype: int64

Dada la naturaleza de las respuestas se va a cambiar a las siguientes:
* Not towing = No
* Towing and interacted with inflatable first = Yes
* Unknown will remain as Unknown

In [30]:
# Definimos una función que haga esta limpieza
def limpia_inflatable(value):

    if value == 'Not towing':
        return 'No'
    elif value == 'Towing and interacted with inflatable first':
        return 'Yes'
    elif value == 'Towing and interacted with boat first': # Esta no está entre las actuales opciones pero se puede dar en el futuroy se interpretará como un 'Yes'
        return 'Yes'
    else:
        return 'Unknown'

In [31]:
df.Towing_Inflatable = df.Towing_Inflatable.apply(limpia_inflatable)
df.Towing_Inflatable.value_counts()

Towing_Inflatable
No         149
Unknown      3
Yes          1
Name: count, dtype: int64

In [32]:
# Perfecto, pasamos a la siguiente.

#### Trailing Fishing Lure

In [33]:
df.Trailing_Fishing_Lure.unique()

array(['No', 'Yes', 'Unknown'], dtype=object)

In [34]:
df.Trailing_Fishing_Lure.value_counts()

Trailing_Fishing_Lure
Unknown    129
No          20
Yes          4
Name: count, dtype: int64

Esta columna se puede quedar como tal, pasamos a la siguiente columna.

#### Physical Contact with Boat

In [35]:
df.Physical_Contact_With_Boat.value_counts()

Physical_Contact_With_Boat
Unknown    129
Yes         22
No           2
Name: count, dtype: int64

Posiblemente esta sea una columna de la que podamos prescindir más adelante o completar la gran cantidad de *Unknowns* usando la información contenida en otras columnas. De momento la vamos a dejar como está.

Number of Adult, juvenile and uncertain age orcas

In [36]:
df.Number_of_Adult_Orcas.value_counts()

Number_of_Adult_Orcas
0    134
1     12
2      3
5      1
6      1
4      1
3      1
Name: count, dtype: int64

In [37]:
df.Number_of_Juvenile_Orcas.value_counts()

Number_of_Juvenile_Orcas
0    144
2      6
5      2
1      1
Name: count, dtype: int64

In [38]:
df.Number_of_Uncertain_Age_Orcas.value_counts()

Number_of_Uncertain_Age_Orcas
0    141
4      4
2      2
3      2
5      2
6      1
7      1
Name: count, dtype: int64

Voy a crear un dataframe separado para aquellas filas que tienen un 0 en las tres columnas ya que se debe a que cambió el formato del cuestionario a mitad base de datos y no se preguntaba cuantas orcas interacturaon. Sin embargo, en las columnas *Crew_Response* y *Orcas_Behaviour* sí que se puede sacar información.

In [42]:
df_old_format = df[(df.Number_of_Adult_Orcas == '0') & (df.Number_of_Juvenile_Orcas == '0') & (df.Number_of_Uncertain_Age_Orcas == '0')]
df_old_format.shape

(126, 29)

In [12]:
df_copia = df.copy()
df_copia.head()

Unnamed: 0,date,lat_and_long,GTOA_Protocol,Boat_Type,Boat_Length,Towing_Inflatable,Trailing_Fishing_Lure,Physical_Contact_With_Boat,Number_of_Adult_Orcas,Number_of_Juvenile_Orcas,Number_of_Uncertain_Age_Orcas,Rudder,Motoring_or_Sailing,Speed_Knots,Sea_State,Wind_Speed_Beaufort,Daylight_or_Darkness,Cloud_Cover,Distance_Off_Land_NM,Depth_Meters,Depth_Gauge,Autopilot,Hull_Topsides_Color,Antifoul_Color,Boat_Damaged,Tow_Required,Crew_Response,Orcas_Behaviour
0,2023-11-01 22:15:00,"32 47.4980 N, 9 54.3980 W","No, did not follow protocol, interaction laste...",Sail,10 - 12.5m,Not towing,No,No,1,0,0,Spade,Sailing,5 - 7,Moderate,5 - 6 (17 - 27 knots),Night,0 - 25%,Over 10,200m+,On,On,White/light,Blue,No,No,"Orca interaction at 10:15pm on 01/11, 40 miles...",I would describe the behaviour of the Orca dur...
1,2023-10-31 07:50:00,"39 26.0000 N, 9 23.0000 W","Yes, followed protocol, interaction lasted les...",Sail,12.5 - 15m,Not towing,No,Yes,2,5,0,Twin rudder,Motoring,5 - 7,Rough,3 - 4 (7 - 16 knots),Day,50 - 75%,2 - 5,40 - 200m,On,On,White/light,Black,"Yes, moderate - immediate repairs required",No,We had sandbags on our sugar scoops and metal ...,Juveniles hitting the rudders adults close by
2,2023-09-19 11:00:00,"37 40.0000 N, 8 54.0000 W","No, did not follow protocol, interaction laste...",Sail,12.5 - 15m,Not towing,No,Yes,1,0,0,Spade,Motoring,5 - 7,Calm,3 - 4 (7 - 16 knots),Day,0 - 25%,2 - 5,40 - 200m,On,On,White/light,Other,"Yes, moderate - immediate repairs required",No,We saw the orca approach from 10 o’clock posit...,There was an initial approach 45 minutes earli...
3,2023-09-01 13:15:00,"45 36.0000 N, 3 45.0000 W","Yes, followed protocol, interaction lasted 10 ...",Sail,Over 15m,Not towing,Yes,Yes,1,2,0,Spade,Sailing,3 - 4,Calm,3 - 4 (7 - 16 knots),Day,25 - 50%,Over 10,200m+,Off,Off,White/light,Black,"Yes, moderate - immediate repairs required",No,Les trois orques passent constamment de bâbord...,Pas de comportement visblement agressif./// No...
4,2023-09-02 03:45:00,"42 45.0000 N, 9 14.0000 W","Yes, followed protocol, interaction lasted les...",Sail,12.5 - 15m,Not towing,No,Yes,1,2,0,Spade,Motorsailing,5 - 7,Calm,0 - 2 (0 - 6 knots),Night,0 - 25%,5 - 10,40 - 200m,On,On,White/light,Black,"Yes, moderate - immediate repairs required",Yes,Arrêt du pilote automatique a la 2 eme interac...,Approche furtive à la première interaction dir...


In [13]:
index_to_shift = 105

# Get the index of the 'Rudder' column
rudder_column_index = df_copia.columns.get_loc('Rudder')

# Use the shift method for the specified row and 'Rudder' column onwards
df_copia.iloc[index_to_shift, rudder_column_index:] = df_copia.iloc[index_to_shift, rudder_column_index:].shift(1)

In [14]:
df_copia

Unnamed: 0,date,lat_and_long,GTOA_Protocol,Boat_Type,Boat_Length,Towing_Inflatable,Trailing_Fishing_Lure,Physical_Contact_With_Boat,Number_of_Adult_Orcas,Number_of_Juvenile_Orcas,Number_of_Uncertain_Age_Orcas,Rudder,Motoring_or_Sailing,Speed_Knots,Sea_State,Wind_Speed_Beaufort,Daylight_or_Darkness,Cloud_Cover,Distance_Off_Land_NM,Depth_Meters,Depth_Gauge,Autopilot,Hull_Topsides_Color,Antifoul_Color,Boat_Damaged,Tow_Required,Crew_Response,Orcas_Behaviour
0,2023-11-01 22:15:00,"32 47.4980 N, 9 54.3980 W","No, did not follow protocol, interaction laste...",Sail,10 - 12.5m,Not towing,No,No,1,0,0,Spade,Sailing,5 - 7,Moderate,5 - 6 (17 - 27 knots),Night,0 - 25%,Over 10,200m+,On,On,White/light,Blue,No,No,"Orca interaction at 10:15pm on 01/11, 40 miles...",I would describe the behaviour of the Orca dur...
1,2023-10-31 07:50:00,"39 26.0000 N, 9 23.0000 W","Yes, followed protocol, interaction lasted les...",Sail,12.5 - 15m,Not towing,No,Yes,2,5,0,Twin rudder,Motoring,5 - 7,Rough,3 - 4 (7 - 16 knots),Day,50 - 75%,2 - 5,40 - 200m,On,On,White/light,Black,"Yes, moderate - immediate repairs required",No,We had sandbags on our sugar scoops and metal ...,Juveniles hitting the rudders adults close by
2,2023-09-19 11:00:00,"37 40.0000 N, 8 54.0000 W","No, did not follow protocol, interaction laste...",Sail,12.5 - 15m,Not towing,No,Yes,1,0,0,Spade,Motoring,5 - 7,Calm,3 - 4 (7 - 16 knots),Day,0 - 25%,2 - 5,40 - 200m,On,On,White/light,Other,"Yes, moderate - immediate repairs required",No,We saw the orca approach from 10 o’clock posit...,There was an initial approach 45 minutes earli...
3,2023-09-01 13:15:00,"45 36.0000 N, 3 45.0000 W","Yes, followed protocol, interaction lasted 10 ...",Sail,Over 15m,Not towing,Yes,Yes,1,2,0,Spade,Sailing,3 - 4,Calm,3 - 4 (7 - 16 knots),Day,25 - 50%,Over 10,200m+,Off,Off,White/light,Black,"Yes, moderate - immediate repairs required",No,Les trois orques passent constamment de bâbord...,Pas de comportement visblement agressif./// No...
4,2023-09-02 03:45:00,"42 45.0000 N, 9 14.0000 W","Yes, followed protocol, interaction lasted les...",Sail,12.5 - 15m,Not towing,No,Yes,1,2,0,Spade,Motorsailing,5 - 7,Calm,0 - 2 (0 - 6 knots),Night,0 - 25%,5 - 10,40 - 200m,On,On,White/light,Black,"Yes, moderate - immediate repairs required",Yes,Arrêt du pilote automatique a la 2 eme interac...,Approche furtive à la première interaction dir...
5,2023-08-22 11:50:00,"42 53.7600 N, 9 21.3900 W","No, did not follow protocol, interaction laste...",Sail,Over 15m,Not towing,Yes,Yes,5,0,0,Spade,Sailing,5 - 7,Calm,3 - 4 (7 - 16 knots),Day,0 - 25%,2 - 5,40 - 200m,On,On,White/light,Black,No,No,"We stayed calm, made no unnecessary noise and ...",The animals seemed to be curious and intereste...
6,2023-08-17 13:30:00,"35 52.7000 N, 5 38.6000 W","No, did not follow protocol, interaction laste...",Sail,Over 15m,Not towing,Yes,Yes,6,0,0,Twin rudder,Motorsailing,5 - 7,Calm,3 - 4 (7 - 16 knots),Day,0 - 25%,2 - 5,200m+,On,On,White/light,Blue,No,No,Pinger; no effect Fireworks; effective,"Came from stern side, submerged under stern an..."
7,2023-07-15 23:58:00,"36 15.0090 N, 5 11.2660 W","Yes, followed protocol, interaction lasted 30 ...",Sail,12.5 - 15m,Not towing,No,Yes,4,0,0,Sailing,5 - 7,Moderate,3 - 4 (7 - 16 knots),Night,0 - 25%,2 - 5,40 - 200m,On,On,White/light,Blue,"Yes, moderate - immediate repairs required",Yes,We were sailing and the boat was on autopilot....,One of them was under the boat. Others were at...,"Not applicable, did not reverse"
8,2023-08-09 11:30:00,"38 12.0000 N, 9 4.0000 W","Yes, followed protocol, interaction lasted les...",Sail,12.5 - 15m,Not towing,Yes,Yes,1,0,0,Full skeg,Motoring,5 - 7,Calm,0 - 2 (0 - 6 knots),Day,0 - 25%,Over 10,40 - 200m,On,On,Dark colour,Black,"Yes, minor - will wait until the end of the se...",No,"As we encountered the orca, we stopped the eng...",Swimming around a lot. O distance about 100 me...
9,2023-07-21 16:10:00,"36 24.6000 N, 4 51.2000 W","No, did not follow protocol, interaction laste...",Sail,Over 15m,Not towing,No,Yes,1,0,0,Spade,Motorsailing,5 - 7,Moderate,3 - 4 (7 - 16 knots),Day,50 - 75%,2 - 5,200m+,On,On,White/light,Coppercoat,No,No,Single orca struck boat (catamaran) on starboa...,"Swimming alongside, three blows to stern/rudder6"
