# Introduction

In this tutorial, you'll learn about more features of GeoPandas.

# Working with CSV data

So far, you have learned how to use the Geopandas library to work with shapefile data.  To create a GeoDataFrame from a CSV file, we'll need to use both Pandas and GeoPandas.

In [None]:
import geopandas as gpd
import pandas as pd

We begin by creating a DataFrame `facilities_df` containing health facilities in Ghana.

In [None]:
# Create a DataFrame with health facilities in Ghana
facilities_df = pd.read_csv("../input/geospatial-course-data/ghana/health_facilities.csv")
print(type(facilities_df))
facilities_df.head()

To convert it to a GeoDataFrame, we use `gpd.GeoDataFrame()`.  The `gpd.points_from_xy()` function creates `Point` objects from the latitude and longitude coordinates.

In [None]:
# Convert the DataFrame to a GeoDataFrame
facilities = gpd.GeoDataFrame(facilities_df, geometry=gpd.points_from_xy(facilities_df.Longitude, facilities_df.Latitude))
print(type(facilities))
facilities.head()

# Coordinate reference system (CRS)

All of the maps you will create in this course portray the surface of the earth in two dimensions.  But, as you know, the world is actually a three-dimensional globe, and so we have to use a special method, called a **map projection**, to render it as a flat surface.  No map projection can ever be 100% accurate.  But, depending on the regional extent of your map and the type of analysis you plan to do, some are better than others.

We use a **coordinate reference system (CRS)** to show how the points on the map correspond to real locations on Earth.  And, we identify different coordinate reference systems according to their [European Petroleum Survey Group (EPSG)](http://www.epsg.org/) codes.  For instance, EPSG code 4326 corresponds to coordinates in latitude and longitude.

When creating a GeoDataFrame _from a CSV file_, we have to set the CRS. 

In [None]:
# Set coordinate reference system (CRS) to EPSG 4326
facilities.crs = {'init': 'epsg:4326'}

When we create a GeoDataFrame _from a shapefile_, the CRS is already imported for us (and it was saved as part of the shapefile).  For instance, in the code cell below, the CRS of the GeoDataFrame `regions` is [EPSG 32630](https://epsg.io/32630). 

In [None]:
# Load a GeoDataFrame containing regions in Ghana
regions = gpd.read_file("../input/geospatial-course-data/ghana/Regions/Map_of_Regions_in_Ghana.shp")
print(regions.crs)

When plotting multiple GeoDataFrames, it's important that they all use the same CRS.  The code cell below uses the `to_crs()` method to change the CRS of the `facilities` GeoDataFrame to match the CRS of `regions` before plotting it.

In [None]:
# Create a map
ax = regions.plot(figsize=(8,8), color='whitesmoke', linestyle=':', edgecolor='black')
facilities.to_crs(epsg=32630).plot(markersize=1, ax=ax)

Note that the `to_crs()` method modifies only the "geometry" column: all other columns are left as-is.

In [None]:
# The "Latitude" and "Longitude" columns are unchanged
facilities.to_crs(epsg=32630).head()

# Attributes of geometric objects

As you learned in the first tutorial, for an arbitrary GeoDataFrame, the type in the "geometry" column depends on what we are trying to show: for instance, we might use:
- a `Point` for the epicenter of an earthquake, 
- a `Line` for a street, or 
- a `Polygon` to show country boundaries.

All three types of geometric objects have built-in attributes that you can use to quickly analyze the dataset.  For instance, you can get the x- and y-coordinates of a `Point` from the `x` and `y` attributes, respectively.

In [None]:
# Get the x-coordinate of each point
facilities.geometry.x.head()

And, you can get the length of a `LineString` from the `length` attribute.  

Or, you can get the area of a `Polygon` from the `area` attribute.

In [None]:
# Calculate the area (in square meters) of each polygon in the GeoDataFrame 
regions.loc[:, "AREA"] = regions.geometry.area / 10**6

print("Area of Ghana: {} square kilometers".format(regions.AREA.sum()))
regions.head()

In the code cell above, we divide by $10^6$ to convert from units of square meters to units of square kilometers.  

Note that the choice of CRS is very important for calculating area and distance (as some coordinate references systems can distort these quantities).  We won't delve deeply into how to select an appropriate CRS for your data in this micro-course.  But, if you'd like to learn more, you can read more at [this link](https://www.axismaps.com/guide/general/map-projections/).

# Your turn

Use what you've learned to **[track bird migration to South America](#$NEXT_NOTEBOOK_URL$)**.