# Instruções

1 - Download dos dados municipais do estado de Minas Gerais; <br>
2 - Transformação dos dados de desmatamento;<br>
3 - Processamento dos dados: <br>
    - reprojeção para EPSG:31983; <br>
    - cálculo de área; <br>
4 - Geração dos arquivos de saída no formato GeoJSON;


In [1]:
from data_utils.data_downloader import download_data_from_url
from data_utils.data_merger import merge_datasets

<br>

# 1 - Download dos dados municipais de MG

In [2]:
# Parâmetros de entrada
URL = 'https://raw.githubusercontent.com/tbrugz/geodata-br/master/geojson/geojs-31-mun.json'
DEST_PATH = './dados/'
FILE_NAME = 'municipios-mg.geojson'

In [3]:
download_data_from_url?

[1;31mSignature:[0m [0mdownload_data_from_url[0m[1;33m([0m[0mdata_url[0m[1;33m,[0m [0mdest_path[0m[1;33m,[0m [0mfile_name[0m[1;33m)[0m[1;33m[0m[1;33m[0m[0m
[1;31mDocstring:[0m
  Downloader function that will do the following steps, in order:
    - Download Geographic data from a web URL
    - Parse the data from bytes to GeoDataFrame
    - Changes the coordinates projection to 'EPSG:31983'
    - Calculates the area in [km^2] for all counties
    - Saves the calculated area in a separate column called 'area'
    - Stores the data in a .geojson file
Args:
    data_url (str): URL to the data set.
    dest_path (str): Destination path where the dataset will be saved.
    file_name (str): Output file name with .geojson extension.
    
Returns:
    (None): Saves the data file with the file_name in the specified dest_path.
[1;31mFile:[0m      c:\repos\data-scientist-test-jan-2024\data_utils\data_downloader.py
[1;31mType:[0m      function

In [4]:
download_data_from_url(URL, DEST_PATH, FILE_NAME)

[2024-01-07 20:47:01,928] - INFO - data_downloader.py - 31: Downloading data from: https://raw.githubusercontent.com/tbrugz/geodata-br/master/geojson/geojs-31-mun.json
[2024-01-07 20:47:03,253] - INFO - data_downloader.py - 38: Data was retrieved from the url
[2024-01-07 20:47:03,254] - INFO - data_downloader.py - 40: Parsing data with GeoPandas
[2024-01-07 20:47:04,369] - DEBUG - session.py - 16: Could not import boto3, continuing with reduced functionality.
[2024-01-07 20:47:04,372] - DEBUG - env.py - 658: GDAL data found in package: path='C:\\Users\\z00426st\\.conda\\envs\\generic-ml\\lib\\site-packages\\fiona\\gdal_data'.
[2024-01-07 20:47:04,374] - DEBUG - env.py - 684: PROJ data found in package: path='C:\\Users\\z00426st\\.conda\\envs\\generic-ml\\lib\\site-packages\\fiona\\proj_data'.
[2024-01-07 20:47:04,407] - DEBUG - env.py - 315: GDAL data found in package: path='C:\\Users\\z00426st\\.conda\\envs\\generic-ml\\lib\\site-packages\\fiona\\gdal_data'.
[2024-01-07 20:47:04,408] 

<br>

# 2 - Transformação dos dados de desmatamento

In [5]:
# Parâmetros de entrada
DATASET1_PATH = './dados/desmatamento_ago22.gpkg'
DATASET2_PATH = './dados/desmatamento_set22.gpkg'
FILE_NAME_DESMATAMENTO =  'focos-desmatamento-mg.geojson'

In [6]:
merge_datasets?

[1;31mSignature:[0m [0mmerge_datasets[0m[1;33m([0m[0mdataset1_path[0m[1;33m,[0m [0mdataset2_path[0m[1;33m,[0m [0mdest_path[0m[1;33m,[0m [0mfile_name[0m[1;33m)[0m[1;33m[0m[1;33m[0m[0m
[1;31mDocstring:[0m
    Function that will read two data sets and join them in the axis 0 (append operation).
    The data sets will contain geographic data so GeoPandas will be used to parse them.

Args:
    dataset1_path (str): Path to the first data set to be merged.
    dataset2_path (str): Path to the second data set to be merged.
    dest_path (str): Destination path where the merged dataset will be saved.
    file_name (str): Output file name with .geojson extension.

Returns:
    (None): Saves the merged data frame with --file-name in the specified --dest-path.
[1;31mFile:[0m      c:\repos\data-scientist-test-jan-2024\data_utils\data_merger.py
[1;31mType:[0m      function

In [None]:
merge_datasets(DATASET1_PATH, DATASET2_PATH, DEST_PATH, FILE_NAME_DESMATAMENTO)

[2024-01-07 20:47:11,260] - INFO - data_merger.py - 33: Reading dataset 1
[2024-01-07 20:47:11,262] - DEBUG - env.py - 315: GDAL data found in package: path='C:\\Users\\z00426st\\.conda\\envs\\generic-ml\\lib\\site-packages\\fiona\\gdal_data'.
[2024-01-07 20:47:11,263] - DEBUG - env.py - 315: PROJ data found in package: path='C:\\Users\\z00426st\\.conda\\envs\\generic-ml\\lib\\site-packages\\fiona\\proj_data'.
[2024-01-07 20:47:11,305] - DEBUG - env.py - 315: GDAL data found in package: path='C:\\Users\\z00426st\\.conda\\envs\\generic-ml\\lib\\site-packages\\fiona\\gdal_data'.
[2024-01-07 20:47:11,307] - DEBUG - env.py - 315: PROJ data found in package: path='C:\\Users\\z00426st\\.conda\\envs\\generic-ml\\lib\\site-packages\\fiona\\proj_data'.
[2024-01-07 20:47:11,307] - DEBUG - collection.py - 307: Got coordinate system
[2024-01-07 20:47:11,308] - DEBUG - collection.py - 300: Got coordinate system
[2024-01-07 20:47:11,308] - DEBUG - file.py - 327: Matched. confidence=100, c_code=b'432