## Análisis de Cobertura de Especies Marinas y Datos Oceanográficos

Este código Python combina datos de cobertura de especies marinas (`S0617_TF14_porcentajes.csv`) con datos oceanográficos (`S0617_TV14_tel.csv`) para analizar la relación entre la abundancia de especies y las condiciones ambientales.

### Pasos clave:

1.  **Carga y preprocesamiento de datos:** Limpieza de datos, selección de columnas relevantes y estandarización de nombres de archivo.
2.  **Combinación de datos:** Unión de dataframes por nombres de archivo de imágenes.
3.  **Análisis de datos:** Análisis descriptivo y estadístico, visualización de resultados.
4.  **Modelado predictivo (opcional):** Predicción de la distribución de especies en función de variables ambientales.

### Requisitos:

*   Python 3.x
*   Pandas
*   NumPy

### Uso:

1.  Coloca los archivos CSV en el mismo directorio que el script.
2.  Ejecuta el script.

### Limitaciones:

*   Asume un formato específico de nombres de archivo.
*   Análisis y modelado básicos, pueden ampliarse.

### Próximos pasos:

*   Incluir más zonas geográficas.
*   Explorar modelos más sofisticados.
*   Incorporar datos adicionales.

In [26]:
import pandas as pd

pd.set_option('display.max_rows', None)
pd.set_option('display.max_columns', None)

# Read the CSV file into a DataFrame
df = pd.read_csv('Porcentajes_2024/ECOMARG2024_TF62_porcentajes.csv')

# Define a function to calculate statistics for a group
def calculate_metrics(x):
    return pd.Series({
        'Sum': x['Porcentaje_Area'].sum(),  # Use a valid key for sum
        'Mean': x['Porcentaje_Area'].mean(),
        'Median': x['Porcentaje_Area'].median(),
        'Nº': x['Porcentaje_Area'].count(),
        'Máx': x['Porcentaje_Area'].max(),
        'Min': x['Porcentaje_Area'].min(),
    })

# Group by 'Archivo' and 'Clase', calculate metrics, and reset index
df_grouped = (
    df.groupby(['Archivo', 'Clase'])
    .apply(calculate_metrics)
    .reset_index()
)

# Pivot to get the desired wide format
df_pivot = df_grouped.pivot(index='Archivo', columns='Clase', values=['Sum', 'Mean', 'Median', 'Nº', 'Máx', 'Min']).fillna(0)

# Flatten the multi-level column names
df_pivot.columns = ['_'.join(col) for col in df_pivot.columns]

# Reset the index to make 'Archivo' a column again
df_result = df_pivot.reset_index()

# Print the resulting DataFrame
# print(df_result.to_markdown(index=False,numalign="left", stralign="left"))
df_result

Unnamed: 0,Archivo,Sum_Desmophyllum pertusum,Sum_Madrepora oculata,Mean_Desmophyllum pertusum,Mean_Madrepora oculata,Median_Desmophyllum pertusum,Median_Madrepora oculata,Nº_Desmophyllum pertusum,Nº_Madrepora oculata,Máx_Desmophyllum pertusum,Máx_Madrepora oculata,Min_Desmophyllum pertusum,Min_Madrepora oculata
0,frame_0.jpg,0.0,0.05,0.0,0.05,0.0,0.05,0.0,1.0,0.0,0.05,0.0,0.05
1,frame_1.jpg,0.0,0.08,0.0,0.08,0.0,0.08,0.0,1.0,0.0,0.08,0.0,0.08
2,frame_107.jpg,0.0,0.03,0.0,0.03,0.0,0.03,0.0,1.0,0.0,0.03,0.0,0.03
3,frame_108.jpg,0.0,0.06,0.0,0.06,0.0,0.06,0.0,1.0,0.0,0.06,0.0,0.06
4,frame_11.jpg,0.12,0.0,0.12,0.0,0.12,0.0,1.0,0.0,0.12,0.0,0.12,0.0
5,frame_12.jpg,0.0,0.83,0.0,0.415,0.0,0.415,0.0,2.0,0.0,0.45,0.0,0.38
6,frame_120.jpg,0.06,0.0,0.06,0.0,0.06,0.0,1.0,0.0,0.06,0.0,0.06,0.0
7,frame_13.jpg,0.0,0.18,0.0,0.09,0.0,0.09,0.0,2.0,0.0,0.1,0.0,0.08
8,frame_135.jpg,0.02,0.0,0.02,0.0,0.02,0.0,1.0,0.0,0.02,0.0,0.02,0.0
9,frame_136.jpg,0.0,0.13,0.0,0.13,0.0,0.13,0.0,1.0,0.0,0.13,0.0,0.13


In [27]:
import pandas as pd

pd.set_option('display.max_rows', None)
pd.set_option('display.max_columns', None)

# Read the CSV file into a DataFrame
df_tel = pd.read_csv('Porcentajes_2024/posi/E0424_TF62_posi.csv')

# Drop rows with null values in the column `Foto`
df_tel_filtered = df_tel.dropna(subset=['Foto'])

# Select the columns 'Temperature', 'CTD_Depth', 'Foto', 'SUB1_Lon' and 'SUB1_Lat'
df_tel_filtered_selected = df_tel_filtered[['Water_Depth', 'Foto', 'SUB1_lon', 'SUB1_lat']]

# Convert the column `Foto` to integer
df_tel_filtered_selected['Foto'] = df_tel_filtered_selected['Foto'].astype(int)

# Convert the column `Foto` to string
df_tel_filtered_selected['Foto'] = df_tel_filtered_selected['Foto'].astype(str)

# Add the prefix 'S0617_TF14_' to each value in the column `Foto`
df_tel_filtered_selected['Foto'] = 'frame_' + df_tel_filtered_selected['Foto']

# Define a function to add a leading zero if the string length is 2 after removing the prefix
def add_zero(value):
    value_str = str(value).replace('frame_', '')  # Remove prefix
    return 'frame_' + value_str + '.jpg'

# Apply the function to the `Foto` column and create a new column
df_tel_filtered_selected['Foto'] = df_tel_filtered_selected['Foto'].apply(add_zero)

# Print the first 5 rows of the DataFrame
df_tel_filtered_selected


A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  df_tel_filtered_selected['Foto'] = df_tel_filtered_selected['Foto'].astype(int)
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  df_tel_filtered_selected['Foto'] = df_tel_filtered_selected['Foto'].astype(str)
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  df_tel_filtered_selected['Foto'] = 'frame_' +

Unnamed: 0,Water_Depth,Foto,SUB1_lon,SUB1_lat
0,-907.72,frame_0.jpg,-5.293647,43.923853
1,-908.34,frame_1.jpg,-5.29364,43.923855
2,-908.15,frame_2.jpg,-5.293645,43.923865
3,-908.02,frame_3.jpg,-5.293648,43.923863
4,-907.92,frame_4.jpg,-5.293643,43.923865
5,-907.82,frame_5.jpg,-5.293655,43.92387
6,-907.75,frame_6.jpg,-5.293663,43.923867
7,-907.68,frame_7.jpg,-5.29367,43.92387
8,-911.06,frame_8.jpg,-5.293673,43.92388
9,-911.06,frame_9.jpg,-5.29368,43.923873


In [28]:
# Combine df_result and df_tel_filtered_selected on 'Archivo' and 'Foto'
df_combined = pd.merge(df_result, df_tel_filtered_selected, left_on='Archivo', right_on='Foto', how='inner')
# Drop the column `Foto` from df_combined
df_combined = df_combined.drop(columns=['Foto'])
# Guardar el dataframe df_combined en un archivo CSV llamado "S0617_TF14_GIS.csv"
df_combined.to_csv('E0424_TF62_GIS.csv', index=False)

df_combined

Unnamed: 0,Archivo,Sum_Desmophyllum pertusum,Sum_Madrepora oculata,Mean_Desmophyllum pertusum,Mean_Madrepora oculata,Median_Desmophyllum pertusum,Median_Madrepora oculata,Nº_Desmophyllum pertusum,Nº_Madrepora oculata,Máx_Desmophyllum pertusum,Máx_Madrepora oculata,Min_Desmophyllum pertusum,Min_Madrepora oculata,Water_Depth,SUB1_lon,SUB1_lat
0,frame_0.jpg,0.0,0.05,0.0,0.05,0.0,0.05,0.0,1.0,0.0,0.05,0.0,0.05,-907.72,-5.293647,43.923853
1,frame_1.jpg,0.0,0.08,0.0,0.08,0.0,0.08,0.0,1.0,0.0,0.08,0.0,0.08,-908.34,-5.29364,43.923855
2,frame_107.jpg,0.0,0.03,0.0,0.03,0.0,0.03,0.0,1.0,0.0,0.03,0.0,0.03,-950.03,-5.294575,43.924175
3,frame_108.jpg,0.0,0.06,0.0,0.06,0.0,0.06,0.0,1.0,0.0,0.06,0.0,0.06,-957.17,-5.294592,43.924183
4,frame_11.jpg,0.12,0.0,0.12,0.0,0.12,0.0,1.0,0.0,0.12,0.0,0.12,0.0,-907.71,-5.293705,43.92388
5,frame_12.jpg,0.0,0.83,0.0,0.415,0.0,0.415,0.0,2.0,0.0,0.45,0.0,0.38,-907.71,-5.293715,43.92389
6,frame_120.jpg,0.06,0.0,0.06,0.0,0.06,0.0,1.0,0.0,0.06,0.0,0.06,0.0,-957.24,-5.294732,43.924228
7,frame_13.jpg,0.0,0.18,0.0,0.09,0.0,0.09,0.0,2.0,0.0,0.1,0.0,0.08,-907.91,-5.293715,43.92388
8,frame_135.jpg,0.02,0.0,0.02,0.0,0.02,0.0,1.0,0.0,0.02,0.0,0.02,0.0,-955.24,-5.29491,43.924282
9,frame_136.jpg,0.0,0.13,0.0,0.13,0.0,0.13,0.0,1.0,0.0,0.13,0.0,0.13,-955.24,-5.29492,43.924282


In [25]:
!ls GIS

E0514_TF07_GIS.csv  E0717_TF16_GIS.csv	E0719_TF18_GIS.csv  IC222_TF03_GIS.csv
E0717_TF14_GIS.csv  E0719_TF05_GIS.csv	E0719_TF19_GIS.csv  S0617_TF14_GIS.csv
E0717_TF15_GIS.csv  E0719_TF16_GIS.csv	IC222_TF01_GIS.csv  Untitled.ipynb
