## Data management

### Introduction

CARTOframes is built on top of [Pandas](https://pandas.pydata.org/) and [GeoPandas](https://geopandas.org/). Therefore, it's compatible with all the data formats supported in those projects like CSV, GeoJSON, Shapefile, etc. This guide will show how to load different data files into DataFrames and how to interact with the CARTO platform to upload DataFrames into tables and download tables or SQL queries into DataFrames.

There are two main concepts we should know before continuing with the guide:
- A [DataFrame](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.html) is a two-dimensional data structure for generic data. It can be thought of as a table with rows and columns. It's composed of [Series](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.Series.html) objects which are one-dimensional data structures.
- A [GeoDataFrame](https://geopandas.org/data_structures.html#geodataframe) is a DataFrame with an extra geometry column. This geometry column is a [GeoSeries](https://geopandas.org/data_structures.html#geoseries) object.

Every time we manage Geographic data, a GeoDataFrame should be used. In case a DataFrame with an encoded geometry column is used (WKB, WKT, etc) every method contains a `geom_col` param to provide the name of that column and decode the geometry internally.

For more information, you can see all the examples [here](https://carto.com/developers/cartoframes/examples/#example-data-management).

#### Steps


To show how to manage your data with CARTOframes, we will follow the next steps:

- Load San Francisco neighborhoods from a GeoJSON file
- Load San Francisco incidents from a CSV file
- Upload data to CARTO
- Calculate their intersection using a SQL query
- Download the result

### Load San Francisco neighborhoods from a GeoJSON file

Let's start by loading and visualizing San Francisco neighborhoods reading an external GeoJSON file.

In [1]:
from geopandas import read_file

neighborhoods_gdf = read_file('https://data.sfgov.org/api/geospatial/pty2-tcw4?method=export&format=GeoJSON')
neighborhoods_gdf.head()

Unnamed: 0,link,name,geometry
0,"http://en.wikipedia.org/wiki/Sea_Cliff,_San_Fr...",Seacliff,"MULTIPOLYGON (((-122.49346 37.78352, -122.4937..."
1,,Lake Street,"MULTIPOLYGON (((-122.48715 37.78379, -122.4872..."
2,http://www.nps.gov/prsf/index.htm,Presidio National Park,"MULTIPOLYGON (((-122.47758 37.81099, -122.4771..."
3,,Presidio Terrace,"MULTIPOLYGON (((-122.47241 37.78735, -122.4710..."
4,http://www.sfgate.com/neighborhoods/sf/innerri...,Inner Richmond,"MULTIPOLYGON (((-122.47263 37.78631, -122.4668..."


In [2]:
from cartoframes.viz import Layer

Layer(neighborhoods_gdf)

### Load San Francisco incidents from a CSV file

Let's do the same with San Francisco incidents. In this case, we will work with an external CSV file.

In [3]:
from pandas import read_csv
from geopandas import GeoDataFrame, points_from_xy

df = read_csv('http://data.sfgov.org/resource/wg3w-h783.csv')

# Clean NaN values
df = df[df['longitude'].notna()]

incidents_gdf = GeoDataFrame(df, geometry=points_from_xy(df['longitude'], df['latitude']))
incidents_gdf.head()

Unnamed: 0,incident_datetime,incident_date,incident_time,incident_year,incident_day_of_week,report_datetime,row_id,incident_id,incident_number,cad_number,...,:@computed_region_qgnn_b9vv,:@computed_region_26cr_cadq,:@computed_region_ajp5_b2md,:@computed_region_nqbw_i6c3,:@computed_region_2dwj_jsy4,:@computed_region_h4ep_8xdi,:@computed_region_y6ts_4iup,:@computed_region_jg9y_a9du,:@computed_region_6pnf_4xz7,geometry
0,2019-05-01T01:00:00.000,2019-05-01T00:00:00.000,01:00,2019,Wednesday,2019-06-12T20:27:00.000,81097515200,810975,190424067,191634131.0,...,10.0,7.0,35.0,,,,,,1.0,POINT (-122.49963 37.76257)
1,2019-06-22T07:45:00.000,2019-06-22T00:00:00.000,07:45,2019,Saturday,2019-06-22T08:05:00.000,81465564020,814655,190450880,191730737.0,...,1.0,10.0,34.0,1.0,,1.0,,,2.0,POINT (-122.40816 37.78054)
2,2019-06-03T16:16:00.000,2019-06-03T00:00:00.000,16:16,2019,Monday,2019-06-03T16:16:00.000,80769875000,807698,190397016,191533509.0,...,2.0,9.0,1.0,,,,,,2.0,POINT (-122.39075 37.72160)
3,2018-11-16T16:34:00.000,2018-11-16T00:00:00.000,16:34,2018,Friday,2018-11-16T16:34:00.000,73857915041,738579,180870806,183202539.0,...,6.0,3.0,6.0,,18.0,,,,2.0,POINT (-122.40488 37.79486)
4,2019-05-27T02:25:00.000,2019-05-27T00:00:00.000,02:25,2019,Monday,2019-05-27T02:55:00.000,80509204134,805092,190378555,191470256.0,...,4.0,6.0,13.0,,,,,,1.0,POINT (-122.43056 37.79772)


In [4]:
from cartoframes.viz import Layer

Layer(incidents_gdf)

### Upload data to CARTO

Let's upload both GeoDataFrames to CARTO so we can see how to interact with the platform. In order to continue, you have to set your CARTO credentials. If you aren't sure about your API key, check the [Authentication guide](/developers/cartoframes/guides/Authentication/) to learn how to get it.

In [5]:
from cartoframes.auth import set_default_credentials

set_default_credentials('creds.json')

In [6]:
from cartoframes import to_carto


neighborhoods_table = 'sf_neighborhoods'
incidents_table = 'sf_incidents'

to_carto(neighborhoods_gdf, neighborhoods_table, if_exists='replace')
to_carto(incidents_gdf, incidents_table, if_exists='replace')

Success! Data uploaded to table "sf_neighborhoods" correctly
Success! Data uploaded to table "sf_incidents" correctly


'sf_incidents'

Now that we have uploaded the data, we can directly visualize the tables using:
```python
Layer(neighborhoods_table)
Layer(incidents_table)
```

### Calculate their intersection using a SQL query

Let's see how we can apply a SQL query to inserct both tables and download the result of the query tp visualize it.

In [7]:
from cartoframes import read_carto

incidents_neighborhoods_gdf = read_carto('''
    SELECT n.cartodb_id, n.the_geom, n.the_geom_webmercator, n.name, COUNT(*) AS incidents
    FROM sf_incidents i INNER JOIN sf_neighborhoods n ON ST_Intersects(i.the_geom, n.the_geom)
    GROUP BY n.cartodb_id
''')
incidents_neighborhoods_gdf.head()

Unnamed: 0,cartodb_id,the_geom,name,incidents
0,635,"MULTIPOLYGON (((-122.41062 37.79088, -122.4104...",Lower Nob Hill,21
1,612,"MULTIPOLYGON (((-122.42880 37.77232, -122.4237...",Mint Hill,3
2,625,"MULTIPOLYGON (((-122.50876 37.73787, -122.5069...",Parkside,10
3,604,"MULTIPOLYGON (((-122.40238 37.79097, -122.4019...",Downtown / Union Square,28
4,667,"MULTIPOLYGON (((-122.40524 37.74305, -122.3997...",Apparel City,5


In [8]:
from cartoframes.viz import color_continuous_style

Layer(incidents_neighborhoods_gdf, style=color_continuous_style('incidents'))

### Conclusion

Congratulations! You have seen how to load data locally, upload it to CARTO, apply a SQL query and download the results. We recommend to upload your data to CARTO when it is too big (> 30MB) to be visualized from a GeoDataFrame or when you want to apply PostGIS queries.
