## a) Import the useful Python libraries.

Aquí importamos las librerías que vamos a utilizar, pandas para el análisis de datos y manipulación de dataframes, geopandas para datos geoespaciales, dateutil para las fechas y shapely.geometry para trabajar con objetos geométricos en el plano cartesiano

In [1]:
import pandas as pd
import geopandas as gpd
import dateutil
from shapely.geometry import shape,Point,Polygon,mapping

## b) Import the shapefile of strategic noise map of the agglomeration of Barcelonès I.

Cargamos el archivo shapefile

In [2]:
fp1 = "mesBCNI_SHP/mesBCN1.shp"

## c) Read the shapefile as Geodataframe ("Noise").

In [3]:
Noise = gpd.read_file(fp1)

## d) Check the table of the Geodataframe.

Podemos ver las 5 primeras filas de datos de nuestro geodataframe.

In [4]:
Noise.head()

Unnamed: 0,IDTRAM,TVLDIA,TVLVES,TVLNIT,TVLDEN,TFLDIA,TFLVES,TFLNIT,TFLDEN,TALDIA,...,TOTDIA,TOTVES,TOTNIT,TOTDEN,POBTOT,POBINT,POBEXT,IDAGLO,CODI_INE,geometry
0,1,69,69,64,72,0,0,0,0,0,...,69,69,64,72,419,0,419,BCN1,8019,"LINESTRING (431811.319 4586053.808, 431815.939..."
1,2,63,63,56,65,0,0,0,0,0,...,63,63,56,65,225,0,225,BCN1,8019,"LINESTRING (431945.629 4586034.777, 431909.599..."
2,3,63,63,56,65,0,0,0,0,0,...,63,63,56,65,76,0,76,BCN1,8019,"LINESTRING (431856.848 4585949.038, 431921.499..."
3,4,58,58,51,60,0,0,0,0,0,...,58,58,51,60,161,0,161,BCN1,8019,"LINESTRING (431649.598 4585971.319, 431664.329..."
4,5,67,67,62,70,0,0,0,0,0,...,67,67,62,70,243,0,243,BCN1,8019,"LINESTRING (431437.549 4586107.351, 431451.810..."


## e) Check the number of rows of the dataset "Noise".

Para contar las filas totales hacemos un recuento de la columna IDTRAM, ya que es primary key del dataframe y por lo tanto esta fila seguro que no contendrá valores nulos, por lo que contando esta columna sabremos las filas totales del geodataframe

In [5]:
Noise['IDTRAM'].count()

16742

## f) Check the data type.

Comprobamos que efectivamente tenemos un objeto del tipo geopandas.

In [6]:
type(Noise)

geopandas.geodataframe.GeoDataFrame

## g) Check the Coordinate Reference system (CRS).

Un método que podemos usar con los geopandas es .crs, que nos permite comprobar el sismeta de referencias de coordenadas que posee el shapefile a la hora de mostrar los datos espaciales.

In [7]:
Noise.crs

<Projected CRS: EPSG:25831>
Name: ETRS89 / UTM zone 31N
Axis Info [cartesian]:
- E[east]: Easting (metre)
- N[north]: Northing (metre)
Area of Use:
- name: Europe - 0°E to 6°E and ETRS89 by country
- bounds: (0.0, 37.0, 6.01, 82.41)
Coordinate Operation:
- name: UTM zone 31N
- method: Transverse Mercator
Datum: European Terrestrial Reference System 1989
- Ellipsoid: GRS 1980
- Prime Meridian: Greenwich

## h) Check the geometry field format.

Comprobamos que la columna geometry es del tipo GeometryDtype, por lo que significa que en esta columna se define la geometría de cada registro.

In [8]:
Noise['geometry'].dtype

<geopandas.array.GeometryDtype at 0x2732303c520>

## i) Describing the data variables of the dataset "Noise"

Utilizamos .describe para obtener información númerica de los datos que nos vamos a encontrar.

In [9]:
Noise.describe()

Unnamed: 0,TVLDIA,TVLVES,TVLNIT,TVLDEN,TFLDIA,TFLVES,TFLNIT,TFLDEN,TALDIA,TALVES,...,OCLVES,OCLNIT,OCLDEN,TOTDIA,TOTVES,TOTNIT,TOTDEN,POBTOT,POBINT,POBEXT
count,16742.0,16742.0,16742.0,16742.0,16742.0,16742.0,16742.0,16742.0,16742.0,16742.0,...,16742.0,16742.0,16742.0,16742.0,16742.0,16742.0,16742.0,16742.0,16742.0,16742.0
mean,58.845419,57.440748,51.615697,60.265022,2.384303,2.313642,2.075081,2.484649,0.0,0.0,...,0.653267,2.375463,2.643472,62.118863,60.575977,54.569824,63.717955,98.187373,11.534882,86.652491
std,16.799869,16.409335,15.294187,17.178666,10.681613,10.357224,9.321741,11.113437,0.0,0.0,...,6.182074,11.50832,12.672604,7.518329,7.465553,8.027668,7.639956,133.26777,53.307138,130.068519
min,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
25%,58.0,57.0,50.0,59.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,58.0,57.0,50.0,60.0,7.0,0.0,0.0
50%,63.0,62.0,55.0,64.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,63.0,62.0,56.0,65.0,53.0,0.0,38.0
75%,67.0,66.0,60.0,69.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,67.0,66.0,60.0,69.0,137.75,0.0,119.0
max,79.0,77.0,73.0,81.0,67.0,64.0,58.0,67.0,0.0,0.0,...,66.0,68.0,74.0,79.0,77.0,73.0,81.0,2776.0,1773.0,2776.0


Podemos obtener estadísticas numéticas sobre columnas específicas del dataframe.

In [10]:
Noise["TVLDIA"].mean()

58.84541870744236

## j) Calculate the number of non-null unique INLDIA entries.

In [11]:
Noise["INLDIA"].nunique()

22

## k) Calculate the number of non-null unique TVLVES entries and describe the variable.

In [12]:
Noise["TVLVES"].nunique()

58

In [13]:
Noise["TVLVES"].describe()

count    16742.000000
mean        57.440748
std         16.409335
min          0.000000
25%         57.000000
50%         62.000000
75%         66.000000
max         77.000000
Name: TVLVES, dtype: float64

## l) Check the presence of Null values in the dataset "Noise".

Podemos comprobar los nulos del dataset aplicando el .isnull, que nos dará una serie booleana con un true si contiene nulos o un false si la columna no tiene nulos.

In [14]:
Noise.isnull().any()

IDTRAM      False
TVLDIA      False
TVLVES      False
TVLNIT      False
TVLDEN      False
TFLDIA      False
TFLVES      False
TFLNIT      False
TFLDEN      False
TALDIA      False
TALVES      False
TALNIT      False
TALDEN      False
INLDIA      False
INLVES      False
INLNIT      False
INLDEN      False
OCLDIA      False
OCLVES      False
OCLNIT      False
OCLDEN      False
TOTDIA      False
TOTVES      False
TOTNIT      False
TOTDEN      False
POBTOT      False
POBINT      False
POBEXT      False
IDAGLO      False
CODI_INE    False
geometry    False
dtype: bool

## m) Count the number of Null values in the dataset "Noise".

También podemos contar exactamente los nulos de las columnas del dataset

In [15]:
count_nan = len(Noise) - Noise.count()
count_nan

IDTRAM      0
TVLDIA      0
TVLVES      0
TVLNIT      0
TVLDEN      0
TFLDIA      0
TFLVES      0
TFLNIT      0
TFLDEN      0
TALDIA      0
TALVES      0
TALNIT      0
TALDEN      0
INLDIA      0
INLVES      0
INLNIT      0
INLDEN      0
OCLDIA      0
OCLVES      0
OCLNIT      0
OCLDEN      0
TOTDIA      0
TOTVES      0
TOTNIT      0
TOTDEN      0
POBTOT      0
POBINT      0
POBEXT      0
IDAGLO      0
CODI_INE    0
geometry    0
dtype: int64

## n) Drop the Null Values in the columns 'IDTRAM' and 'TVLVES' and build a new geodataframe called "Noise2".

Con .drop podemos eliminar los valores nulos de las columnas que especifiquemos, esto junto a los métodos anteriores para detectar nulos nos será útil para limpiar los datos.

In [16]:
Noise2=Noise.dropna(subset=['IDTRAM','TVLVES'])

## o) Check the number of rows of the dataset "Noise2".

In [17]:
Noise2['IDAGLO'].count()

16742

## p) Number of non-null unique IDTRAM entries of the dataset "Noise2".

Comprobamos que la columna IDTRAM es única para cada registro, ya que tiene tantos no nulos únicos como filas, lo que encaja con el concepto de clave primaria que no puede ser nula ni repetirse entre los registros.

In [18]:
Noise2['IDTRAM'].nunique()

16742

## q) Number of non-null unique IDAGLO entries of the dataset "Noise2".

Según la referencia de los metadatos de este dataset, el IDAGLO representa: Nombre de la aglomeración y vemos que en nuestro caso siempre es BCN1 porque estamos centrándonos en BCN1.


In [19]:
Noise.isnull().any()

1

## m) Count the number of Null values in the dataset "Noise".

También podemos contar exactamente los nulos de las columnas del dataset

In [20]:
count_nan = len(Noise) - Noise.count()
count_nan

## n) Drop the Null Values in the columns 'IDTRAM' and 'TVLVES' and build a new geodataframe called "Noise2".

Con .drop podemos eliminar los valores nulos de las columnas que especifiquemos, esto junto a los métodos anteriores para detectar nulos nos será útil para limpiar los datos.

In [21]:
Noise2=Noise.dropna(subset=['IDTRAM','TVLVES'])

IDTRAM      0
TVLDIA      0
TVLVES      0
TVLNIT      0
TVLDEN      0
TFLDIA      0
TFLVES      0
TFLNIT      0
TFLDEN      0
TALDIA      0
TALVES      0
TALNIT      0
TALDEN      0
INLDIA      0
INLVES      0
INLNIT      0
INLDEN      0
OCLDIA      0
OCLVES      0
OCLNIT      0
OCLDEN      0
TOTDIA      0
TOTVES      0
TOTNIT      0
TOTDEN      0
POBTOT      0
POBINT      0
POBEXT      0
IDAGLO      0
CODI_INE    0
geometry    0
dtype: int64

## o) Check the number of rows of the dataset "Noise2".

In [None]:
Noise2['IDAGLO'].count()

16742

## p) Number of non-null unique IDTRAM entries of the dataset "Noise2".

Comprobamos que la columna IDTRAM es única para cada registro, ya que tiene tantos no nulos únicos como filas, lo que encaja con el concepto de clave primaria que no puede ser nula ni repetirse entre los registros.

In [None]:
Noise2['IDTRAM'].nunique()

16742

## q) Number of non-null unique IDAGLO entries of the dataset "Noise2".

Según la referencia de los metadatos de este dataset, el IDAGLO representa: Nombre de la aglomeración y vemos que en nuestro caso siempre es BCN1 porque estamos centrándonos en BCN1.


In [None]:
Noise2['IDAGLO'].nunique()

1

## r) Describing the dataset "Noise2".

In [None]:
Noise2.describe()

Unnamed: 0,TVLDIA,TVLVES,TVLNIT,TVLDEN,TFLDIA,TFLVES,TFLNIT,TFLDEN,TALDIA,TALVES,...,OCLVES,OCLNIT,OCLDEN,TOTDIA,TOTVES,TOTNIT,TOTDEN,POBTOT,POBINT,POBEXT
count,16742.0,16742.0,16742.0,16742.0,16742.0,16742.0,16742.0,16742.0,16742.0,16742.0,...,16742.0,16742.0,16742.0,16742.0,16742.0,16742.0,16742.0,16742.0,16742.0,16742.0
mean,58.845419,57.440748,51.615697,60.265022,2.384303,2.313642,2.075081,2.484649,0.0,0.0,...,0.653267,2.375463,2.643472,62.118863,60.575977,54.569824,63.717955,98.187373,11.534882,86.652491
std,16.799869,16.409335,15.294187,17.178666,10.681613,10.357224,9.321741,11.113437,0.0,0.0,...,6.182074,11.50832,12.672604,7.518329,7.465553,8.027668,7.639956,133.26777,53.307138,130.068519
min,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
25%,58.0,57.0,50.0,59.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,58.0,57.0,50.0,60.0,7.0,0.0,0.0
50%,63.0,62.0,55.0,64.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,63.0,62.0,56.0,65.0,53.0,0.0,38.0
75%,67.0,66.0,60.0,69.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,67.0,66.0,60.0,69.0,137.75,0.0,119.0
max,79.0,77.0,73.0,81.0,67.0,64.0,58.0,67.0,0.0,0.0,...,66.0,68.0,74.0,79.0,77.0,73.0,81.0,2776.0,1773.0,2776.0


## s) Check the presence of Null values in the dataset "Noise2".

In [None]:
Noise2.isnull().any()

IDTRAM      False
TVLDIA      False
TVLVES      False
TVLNIT      False
TVLDEN      False
TFLDIA      False
TFLVES      False
TFLNIT      False
TFLDEN      False
TALDIA      False
TALVES      False
TALNIT      False
TALDEN      False
INLDIA      False
INLVES      False
INLNIT      False
INLDEN      False
OCLDIA      False
OCLVES      False
OCLNIT      False
OCLDEN      False
TOTDIA      False
TOTVES      False
TOTNIT      False
TOTDEN      False
POBTOT      False
POBINT      False
POBEXT      False
IDAGLO      False
CODI_INE    False
geometry    False
dtype: bool

## t) Count the number of Null values in the dataset "Noise2".

In [None]:
count_nan2 = len(Noise2) - Noise2.count()
count_nan2

IDTRAM      0
TVLDIA      0
TVLVES      0
TVLNIT      0
TVLDEN      0
TFLDIA      0
TFLVES      0
TFLNIT      0
TFLDEN      0
TALDIA      0
TALVES      0
TALNIT      0
TALDEN      0
INLDIA      0
INLVES      0
INLNIT      0
INLDEN      0
OCLDIA      0
OCLVES      0
OCLNIT      0
OCLDEN      0
TOTDIA      0
TOTVES      0
TOTNIT      0
TOTDEN      0
POBTOT      0
POBINT      0
POBEXT      0
IDAGLO      0
CODI_INE    0
geometry    0
dtype: int64