# Introduction

...

# Working with CSV data

In the previous tutorial, you learned how to use the [Geopandas](http://geopandas.org/) library to work with shapefile data.  To create a GeoDataFrame from a CSV file, we'll need to use both Pandas and GeoPandas.

In [None]:
import geopandas as gpd
import pandas as pd

We begin by creating a DataFrame.

In [None]:
# Read in the data
df = pd.read_csv("../input/geospatial-course-data/dec_lands/DEC_lands.shp")

Then, to get a GeoDataFrame, we use `gpd.GeoDataFrame()`, where the `gpd.points_from_xy()` function creates `Point` objects from the latitude and longitude coordinates.

In [None]:
# Create a GeoDataFrame
gdf = gpd.GeoDataFrame(df, geometry=gpd.points_from_xy(df.Longitude, df.Latitude))
gdf.head()

Once we have a GeoDataFrame, we can quickly plot it!

In [None]:
gdf.plot()

# Coordinate reference system (CRS)

All of the maps you will create in this course portray the surface of the earth in two dimensions.  But, as you know, the world is actually a three-dimensional globe, and so we have to use a special method, called a **map projection**, to render it as a flat surface.  No map projection can ever be 100% accurate.  But, depending on the regional extent of your map and the type of analysis you plan to do, some are better than others.

We use a **coordinate reference system (CRS)** to show how the points on the map correspond to real locations on Earth.  And, we identify different coordinate reference systems according to their [European Petroleum Survey Group (EPSG)](http://www.epsg.org/) codes.  For instance, EPSG code 4326 corresponds to coordinates in latitude and longitude.

When creating a GeoDataFrame from a CSV file, we have to set the appropriate CRS.

In [None]:
# Set coordinate reference system (CRS) to EPSG 4326
wild_lands.crs = {}

Each choice of CRS yields a different map.  We can investigate this with the `to_crs()` method.

In [None]:
ax = counties.to_crs(epsg=26918).plot(figsize=(10,10), color='none', edgecolor='gainsboro', zorder=3)
wild_lands.plot(color='lightgreen', ax=ax)
campsites.plot(color='maroon', markersize=2, ax=ax)
trails.plot(color='black', markersize=1, ax=ax)

We won't go into how to select an appropriate CRS for your data in this micro-course.  But, if you'd like to learn more, you can read more at [this link](https://www.axismaps.com/guide/general/map-projections/).  

# Attributes of geometric objects

As you learned in the previous tutorial, for an arbitrary GeoDataFrame, the type in the "geometry" column will depend on what we are trying to show: for instance, we might use:
- a `Point` for the epicenter of an earthquake, 
- a `Line` for a street, or 
- a `Polygon` to show country boundaries.

All three types of geometric objects have built-in attributes that you can use to quickly analyze the dataset.  For instance, you can get the area of a polygon from the `area` attribute.

In [None]:
# Calculate the area (in square meters) of each polygon in the GeoDataFrame 
wild_lands.loc[:, "AREA"] = wild_lands.geometry.area 
wild_lands.head()


With this information, we can check how many square kilometers of forest and wilderness can be found in the state of New York.  
> If you need to review how to use `groupby()`, check out [this tutorial](https://www.kaggle.com/residentmario/grouping-and-sorting) from the Pandas micro-course.

```python
# Divide by 10**6 to convert square meters to square kilometers
wild_lands.groupby('CLASS').AREA.sum() / 10**6
```

New York has approximately 6193 square kilometers of forest, and 5494 square kilometers of wilderness.

should also cover how to get x, y coordinates of a point.