# "Geospatial Analysis exercise - coordinate reference systems"
> "Geospatial Analysis exercise - coordinate reference systems"

- toc:true
- branch: master
- badges: true
- comments: true
- author: jaeeon
- categories: [jupyter, python]

**Introduction**

You are a bird conservation expert and want to understand migration patterns of purple martins.  In your research, you discover that these birds typically spend the summer breeding season in the eastern United States, and then migrate to South America for the winter.  But since this bird is under threat of endangerment, you'd like to take a closer look at the locations that these birds are more likely to visit.

There are several [protected areas](https://www.iucn.org/theme/protected-areas/about) in South America, which operate under special regulations to ensure that species that migrate (or live) there have the best opportunity to thrive.  You'd like to know if purple martins tend to visit these areas.  To answer this question, you'll use some recently collected data that tracks the year-round location of eleven different birds.

Before you get started, run the code cell below to set everything up.

In [3]:
import pandas as pd
import geopandas as gpd

from shapely.geometry import LineString

from learntools.core import binder
binder.bind(globals())
from learntools.geospatial.ex2 import *

# Exercises

### 1) Load the data.

Run the next code cell (without changes) to load the GPS data into a pandas DataFrame `birds_df`.  

In [4]:
# Load the data and print the first 5 rows
birds_df = pd.read_csv("../input/geospatial-learn-course-data/purple_martin.csv", parse_dates=['timestamp'])
print("There are {} different birds in the dataset.".format(birds_df["tag-local-identifier"].nunique()))
birds_df.head()

There are 11 birds in the dataset, where each bird is identified by a unique value in the "tag-local-identifier" column.  Each bird has several measurements, collected at different times of the year.

Use the next code cell to create a GeoDataFrame `birds`.  
- `birds` should have all of the columns from `birds_df`, along with a "geometry" column that contains Point objects with (longitude, latitude) locations.  
- Set the CRS of `birds` to `{'init': 'epsg:4326'}`.

데이터 세트에는 11마리의 새가 있으며 각 새는 "tag-local-identifier" 열의 고유한 값으로 식별됩니다. 각 새는 일년 중 다른시기에 수집 된 여러 측정 값을 가지고 있습니다.

다음 코드 셀을 사용하여 GeoDataFrame '새'를 만듭니다.
- `birds`에는 (경도, 위도) 위치가 있는 Point 개체를 포함하는 "기하학" 열과 함께 `birds_df`의 모든 열이 있어야 합니다.
- `birds`의 CRS를 `{'init': 'epsg:4326'}`으로 설정합니다.

In [8]:
# Your code here: Create the GeoDataFrame
birds = gpd.GeoDataFrame(birds_df,
                         geometry=gpd.points_from_xy(birds_df['location-long'], birds_df["location-lat"]))

# Your code here: Set the CRS to {'init': 'epsg:4326'}
birds.crs = {'init': 'epsg:4326'}

# Check your answer
q_1.check()

In [10]:
birds

In [9]:
# Lines below will give you a hint or solution code
#q_1.hint()
#q_1.solution()

### 2) Plot the data.

Next, we load in the `'naturalearth_lowres'` dataset from GeoPandas, and set `americas` to a GeoDataFrame containing the boundaries of all countries in the Americas (both North and South America).  Run the next code cell without changes.

다음으로 GeoPandas에서 `'naturalearth_lowres` 데이터 세트를 로드하고 `americas`를 미주(북미와 남미 모두)의 모든 국가 경계를 포함하는 GeoDataFrame으로 설정합니다. 변경 없이 다음 코드 셀을 실행합니다.

In [11]:
# Load a GeoDataFrame with country boundaries in North/South America, print the first 5 rows
world = gpd.read_file(gpd.datasets.get_path('naturalearth_lowres'))
americas = world.loc[world['continent'].isin(['North America', 'South America'])]
americas.head()

Use the next code cell to create a single plot that shows both: (1) the country boundaries in the `americas` GeoDataFrame, and (2) all of the points in the `birds_gdf` GeoDataFrame.  

Don't worry about any special styling here; just create a preliminary plot, as a quick sanity check that all of the data was loaded properly.  In particular, you don't have to worry about color-coding the points to differentiate between birds, and you don't have to differentiate starting points from ending points.  We'll do that in the next part of the exercise.

다음 코드 셀을 사용하여 (1) `americas` GeoDataFrame의 국가 경계와 (2) `birds_gdf` GeoDataFrame의 모든 점을 모두 표시하는 단일 플롯을 만듭니다.

특별한 스타일링에 대해 걱정하지 마십시오. 모든 데이터가 제대로 로드되었는지 신속하게 확인하기 위해 예비 플롯을 생성하기만 하면 됩니다. 특히 새를 구분하기 위해 포인트를 색으로 구분할 필요가 없고, 시작점과 끝점을 구분할 필요가 없습니다. 연습의 다음 부분에서 그렇게 할 것입니다.

In [14]:
# Your code here
ax = americas.plot(figsize=(8,8), color='whitesmoke', linestyle=':', edgecolor='black')
ax

birds.plot(color='red',markersize=10,ax=ax)

# Uncomment to see a hint
#q_2.hint()

In [15]:
# Get credit for your work after you have created a map
q_2.check()

# Uncomment to see our solution (your code may look different!)
##q_2.solution()

### 3) Where does each bird start and end its journey? (Part 1)

Now, we're ready to look more closely at each bird's path.  Run the next code cell to create two GeoDataFrames:
- `path_gdf` contains LineString objects that show the path of each bird.  It uses the `LineString()` method to create a LineString object from a list of Point objects.
- `start_gdf` contains the starting points for each bird.

In [16]:
# GeoDataFrame showing path for each bird
path_df = birds.groupby("tag-local-identifier")['geometry'].apply(list).apply(lambda x: LineString(x)).reset_index()
path_gdf = gpd.GeoDataFrame(path_df, geometry=path_df.geometry)
path_gdf.crs = {'init' :'epsg:4326'}

# GeoDataFrame showing starting point for each bird
start_df = birds.groupby("tag-local-identifier")['geometry'].apply(list).apply(lambda x: x[0]).reset_index()
start_gdf = gpd.GeoDataFrame(start_df, geometry=start_df.geometry)
start_gdf.crs = {'init' :'epsg:4326'}

# Show first five rows of GeoDataFrame
start_gdf.head()

Use the next code cell to create a GeoDataFrame `end_gdf` containing the final location of each bird.  
- The format should be identical to that of `start_gdf`, with two columns ("tag-local-identifier" and "geometry"), where the "geometry" column contains Point objects.
- Set the CRS of `end_gdf` to `{'init': 'epsg:4326'}`.

다음 코드 셀을 사용하여 각 새의 최종 위치를 포함하는 GeoDataFrame `end_gdf`를 만듭니다.
- 형식은 두 개의 열("tag-local-identifier" 및 "geometry")이 있는 `start_gdf`의 형식과 동일해야 합니다. 여기서 "geometry" 열은 Point 개체를 포함합니다.
- `end_gdf`의 CRS를 `{'init': 'epsg:4326'}`으로 설정합니다.

In [33]:
# Your code here
end_df = birds.groupby("tag-local-identifier")['geometry'].apply(list).apply(lambda x: x[-1]).reset_index()
end_df

In [34]:
end_gdf = gpd.GeoDataFrame(end_df, geometry=end_df.geometry)
end_gdf.crs = {'init': 'epsg:4326'}
end_gdf

In [35]:
# Check your answer
q_3.check()

In [32]:
# Lines below will give you a hint or solution code
#q_3.hint()
#q_3.solution()

### 4) Where does each bird start and end its journey? (Part 2)

Use the GeoDataFrames from the question above (`path_gdf`, `start_gdf`, and `end_gdf`) to visualize the paths of all birds on a single map.  You may also want to use the `americas` GeoDataFrame.

위 질문의 GeoDataFrames(`path_gdf`, `start_gdf`, `end_gdf`)를 사용하여 단일 지도에서 모든 새의 경로를 시각화하세요. `americas` GeoDataFrame을 사용할 수도 있습니다.

In [42]:
# Your code here
ax = americas.plot(figsize=(10, 10), color='whitesmoke', linestyle=':', edgecolor='black')

start_gdf.plot(color='red',  markersize=20, ax=ax)
path_gdf.plot(color='green',linestyle=':', linewidth=1, zorder=1, ax=ax)
end_gdf.plot(color='blue', markersize=20, ax=ax)

# Uncomment to see a hint
#q_4.hint()

In [43]:
# Get credit for your work after you have created a map
q_4.check()

# Uncomment to see our solution (your code may look different!)
#q_4.solution()

### 5) Where are the protected areas in South America? (Part 1)

It looks like all of the birds end up somewhere in South America.  But are they going to protected areas?

In the next code cell, you'll create a GeoDataFrame `protected_areas` containing the locations of all of the protected areas in South America.  The corresponding shapefile is located at filepath `protected_filepath`.

다음 코드 셀에서는 남미의 모든 보호 지역 위치를 포함하는 GeoDataFrame `protected_areas`를 생성합니다. 해당 shapefile은 파일 경로 `protected_filepath`에 있습니다.

In [45]:
# Path of the shapefile to load
protected_filepath = "../input/geospatial-learn-course-data/SAPA_Aug2019-shapefile/SAPA_Aug2019-shapefile/SAPA_Aug2019-shapefile-polygons.shp"

# Your code here
protected_areas = gpd.read_file(protected_filepath)

# Check your answer
q_5.check()

In [46]:
# Lines below will give you a hint or solution code
#q_5.hint()
#q_5.solution()

### 6) Where are the protected areas in South America? (Part 2)

Create a plot that uses the `protected_areas` GeoDataFrame to show the locations of the protected areas in South America.  (_You'll notice that some protected areas are on land, while others are in marine waters._)

'protected_areas' GeoDataFrame을 사용하여 남아메리카의 보호 지역 위치를 표시하는 플롯을 만듭니다. (_일부 보호 구역은 육지에 있고 다른 보호 구역은 바다에 있음을 알 수 있습니다._)

In [49]:
# Country boundaries in South America
south_america = americas.loc[americas['continent']=='South America']

# Your code here: plot protected areas in South America
ax = south_america.plot(figsize=(10, 10), color='whitesmoke', edgecolor='black')

protected_areas.plot(color='red', markersize=20, ax=ax, alpha = 0.5)

# Uncomment to see a hint
#q_6.hint()

In [50]:
# Get credit for your work after you have created a map
q_6.check()

# Uncomment to see our solution (your code may look different!)
#q_6.solution()

### 7) What percentage of South America is protected?

You're interested in determining what percentage of South America is protected, so that you know how much of South America is suitable for the birds.  

As a first step, you calculate the total area of all protected lands in South America (not including marine area).  To do this, you use the "REP_AREA" and "REP_M_AREA" columns, which contain the total area and total marine area, respectively, in square kilometers.

Run the code cell below without changes.

남아메리카의 몇 퍼센트가 보호되는지 확인하여 남아메리카의 어느 부분이 새에게 적합한지 알고 싶습니다.

첫 번째 단계로 남아메리카의 모든 보호 구역(해양 지역 제외)의 총 면적을 계산합니다. 이렇게 하려면 총 면적과 총 해양 면적을 각각 평방 킬로미터 단위로 포함하는 "REP_AREA" 및 "REP_M_AREA" 열을 사용합니다.

변경 없이 아래 코드 셀을 실행합니다.

In [52]:
P_Area = sum(protected_areas['REP_AREA']-protected_areas['REP_M_AREA'])
print("South America has {} square kilometers of protected areas.".format(P_Area))

Then, to finish the calculation, you'll use the `south_america` GeoDataFrame.  

그런 다음 계산을 완료하기 위해 `south_america` GeoDataFrame을 사용합니다.

In [53]:
south_america.head()

Calculate the total area of South America by following these steps:
- Calculate the area of each country using the `area` attribute of each polygon (with EPSG 3035 as the CRS), and add up the results.  The calculated area will be in units of square meters.
- Convert your answer to have units of square kilometeters.

다음 단계에 따라 남아메리카의 총 면적을 계산합니다.
- 각 폴리곤(CRS로 EPSG 3035 사용)의 'area' 속성을 사용하여 각 국가의 면적을 계산하고 결과를 합산합니다. 계산된 면적은 평방 미터 단위입니다.
- 답을 제곱킬로미터 단위로 변환합니다.

참고 링크 : http://mwultong.blogspot.com/2008/01/km2-km2-m2-calc.html

In [64]:
# Your code here: Calculate the total area of South America (in square kilometers)
south_america_crs = south_america.geometry.to_crs(epsg=3035).area

# 평방미터(m**2)
sum_area = sum(south_america_crs)
sum_area

# 제곱킬로미터(km**2)로 변환(m2 / 10**6)
totalArea = sum_area / 10**6
totalArea

# Check your answer
q_7.check()

In [65]:
# Lines below will give you a hint or solution code
#q_7.hint()
#q_7.solution()

Run the code cell below to calculate the percentage of South America that is protected.

In [66]:
# What percentage of South America is protected?
percentage_protected = P_Area/totalArea
print('Approximately {}% of South America is protected.'.format(round(percentage_protected*100, 2)))

### 8) Where are the birds in South America?

So, are the birds in protected areas?  

Create a plot that shows for all birds, all of the locations where they were discovered in South America.  Also plot the locations of all protected areas in South America.

To exclude protected areas that are purely marine areas (with no land component), you can use the "MARINE" column (and plot only the rows in `protected_areas[protected_areas['MARINE']!='2']`, instead of every row in the `protected_areas` GeoDataFrame).

모든 새, 남미에서 발견된 모든 위치를 보여주는 플롯을 만듭니다. 또한 남아메리카의 모든 보호 지역의 위치를 표시합니다.

순수한 해양 지역(토지 구성요소 없음)인 보호 지역을 제외하려면 "MARINE" 열을 사용할 수 있습니다(그리고 `protected_areas[protected_areas['MARINE']!='2']` `protected_areas` GeoDataFrame의 모든 행).

In [69]:
# Your code here
ax = south_america.plot(figsize=(8,8), color='whitesmoke', edgecolor='black')

# 남아메리카의 모든 보호 지역의 위치를 표시
birds[birds.geometry.y < 0].plot(ax=ax, color='red', alpha=0.5, markersize=10, zorder=2)

# 순수한 해양 지역(토지 구성요소 없음)인 보호 지역을 제외
protected_areas[protected_areas['MARINE']!='2'].plot(ax=ax, alpha=0.5, zorder=1)

# Uncomment to see a hint
#q_8.hint()

In [70]:
# Get credit for your work after you have created a map
q_8.check()

# Uncomment to see our solution (your code may look different!)
#q_8.solution()

# Keep going

Create stunning **[interactive maps](https://www.kaggle.com/alexisbcook/interactive-maps)** with your geospatial data.

---




*Have questions or comments? Visit the [course discussion forum](https://www.kaggle.com/learn/geospatial-analysis/discussion) to chat with other learners.*