**This notebook is an exercise in the [Geospatial Analysis](https://www.kaggle.com/learn/geospatial-analysis) course.  You can reference the tutorial at [this link](https://www.kaggle.com/alexisbcook/coordinate-reference-systems).**

---


# "kaggle Geospatial Analysis (2)"
> "exercise-coordinate-reference-systems"

- toc:true
- branch: master
- badges: true
- comments: true
- author: EunSu Cho
- categories: [jupyter, python]

# Introduction

You are a bird conservation expert and want to understand migration patterns of purple martins.  In your research, you discover that these birds typically spend the summer breeding season in the eastern United States, and then migrate to South America for the winter.  But since this bird is under threat of endangerment, you'd like to take a closer look at the locations that these birds are more likely to visit.

당신은 조류 보호 전문가로 보라색 마르틴의 이동 패턴을 이해하고 싶습니다. 당신의 연구에서는 이 새들은 보통 미국 동부에서 여름 번식기를 보내고 그 후 겨울에 남아메리카로 이주하는 것을 발견합니다. 하지만 이 새는 위험에 처해 있기 때문에 이 새들이 방문할 가능성이 높은 곳을 자세히 살펴봅시다.

<center>
<img src="https://i.imgur.com/qQcS0KM.png" width="1000"><br/>
</center>

There are several [protected areas](https://www.iucn.org/theme/protected-areas/about) in South America, which operate under special regulations to ensure that species that migrate (or live) there have the best opportunity to thrive.  You'd like to know if purple martins tend to visit these areas.  To answer this question, you'll use some recently collected data that tracks the year-round location of eleven different birds.

보라색 마틴이 이들 지역을 방문하는 경향이 있는지 알고 싶습니다. 이 질문에 답하기 위해 최근 수집한 11마리의 서로 다른 새의 일년 중 위치를 추적하는 데이터를 사용합니다.

Before you get started, run the code cell below to set everything up.

In [1]:
import pandas as pd
import geopandas as gpd

from shapely.geometry import LineString

from learntools.core import binder
binder.bind(globals())
from learntools.geospatial.ex2 import *

# Exercises

### 1) Load the data.

Run the next code cell (without changes) to load the GPS data into a pandas DataFrame `birds_df`.  

다음 코드 셀(변경 없음)을 실행하여 GPS 데이터를 판다 Data Frame`birds_df`로 로드합니다.

In [2]:
# Load the data and print the first 5 rows
birds_df = pd.read_csv("../input/geospatial-learn-course-data/purple_martin.csv", parse_dates=['timestamp'])
print("There are {} different birds in the dataset.".format(birds_df["tag-local-identifier"].nunique()))
birds_df.head()

There are 11 birds in the dataset, where each bird is identified by a unique value in the "tag-local-identifier" column.  Each bird has several measurements, collected at different times of the year.

Use the next code cell to create a GeoDataFrame `birds`.  
- `birds` should have all of the columns from `birds_df`, along with a "geometry" column that contains Point objects with (longitude, latitude) locations.  
- Set the CRS of `birds` to `{'init': 'epsg:4326'}`.

데이터 셋에는 11마리의 새가 있으며 각 새는 태그 로컬 식별자 열에서 고유한 값으로 식별됩니다. 각각의 새에는 한 해의 다양한 시기에 수집된 몇 가지 측정치가 있습니다.

다음 코드셀을 사용하여 GeoDataFrame의 "birds"를 작성합니다.
- "birds"는 "birds_df"의 모든 컬럼과 (길이, 위도) 위치를 갖는 포인트 객체를 포함하는 "지오메트리" 컬럼을 가져야 합니다.
- 'birds'의 CRS를 '{'init':'epsg:4326'}'로 설정합니다.

In [8]:
# Your code here: Create the GeoDataFrame
birds = gpd.GeoDataFrame(birds_df, geometry=gpd.points_from_xy(birds_df["location-long"], birds_df["location-lat"]))

# Your code here: Set the CRS to {'init': 'epsg:4326'}
birds.crs = {'init': 'epsg:4326'}

# Check your answer
q_1.check()

In [9]:
# Lines below will give you a hint or solution code
#q_1.hint()
#q_1.solution()

### 2) Plot the data.

Next, we load in the `'naturalearth_lowres'` dataset from GeoPandas, and set `americas` to a GeoDataFrame containing the boundaries of all countries in the Americas (both North and South America).  Run the next code cell without changes.

다음으로 GeoPandas에서 naturalearth_lowres 데이터셋을 읽고 americas를 북미와 남미 모든 국가의 경계를 포함하는 GeoDataFrame으로 설정합니다. 변경없이 다음 코드셀을 실행합니다.

In [31]:
# Load a GeoDataFrame with country boundaries in North/South America, print the first 5 rows
world = gpd.read_file(gpd.datasets.get_path('naturalearth_lowres'))
americas = world.loc[world['continent'].isin(['North America', 'South America'])]
americas.head()

Use the next code cell to create a single plot that shows both: (1) the country boundaries in the `americas` GeoDataFrame, and (2) all of the points in the `birds_gdf` GeoDataFrame.  

Don't worry about any special styling here; just create a preliminary plot, as a quick sanity check that all of the data was loaded properly.  In particular, you don't have to worry about color-coding the points to differentiate between birds, and you don't have to differentiate starting points from ending points.  We'll do that in the next part of the exercise.

다음 코드셀을 사용하여 (1) '미국' 지오데이터 프레임의 국가 경계와 (2) 'birds_gdf' 지오데이터 프레임의 모든 포인트를 나타내는 단일 플롯을 작성합니다.

여기서는 특별한 스타일을 신경 쓸 필요가 없어요. 모든 데이터가 올바르게 로드되어 있는지 빠르게 건전성 체크하기 위해 예비 플롯을 작성하기만 하면 됩니다. 특히 포인트를 컬러 코딩해 새를 구별할 필요가 없고 시작점과 종점을 구별할 필요도 없다.

In [18]:
# Your code here
ax = americas.plot(figsize=(8,8), color='whitesmoke', linestyle=':', edgecolor='black')
birds_gdf = birds.to_crs(epsg=4326).plot(markersize=2, ax=ax)

# Uncomment to see a hint
q_2.hint()

In [19]:
# Get credit for your work after you have created a map
q_2.check()

# Uncomment to see our solution (your code may look different!)
##q_2.solution()

### 3) Where does each bird start and end its journey? (Part 1)

Now, we're ready to look more closely at each bird's path.  Run the next code cell to create two GeoDataFrames:
- `path_gdf` contains LineString objects that show the path of each bird.  It uses the `LineString()` method to create a LineString object from a list of Point objects.
- `start_gdf` contains the starting points for each bird.

이제 각각의 새들의 길을 더 자세히 살펴볼 준비가 되었습니다. 다음 코드셀을 실행하여 2개의 GeoDataFrame을 작성합니다.
- 'path_gdf'에는 각 새의 경로를 나타내는 LineString 객체가 포함되어 있습니다. 'LineString( )' 메서드를 사용하여 Point 객체의 목록에서 LineString 객체를 작성합니다.
- start_gdf에는 각 새의 출발점이 포함되어 있습니다.

In [20]:
# GeoDataFrame showing path for each bird
path_df = birds.groupby("tag-local-identifier")['geometry'].apply(list).apply(lambda x: LineString(x)).reset_index()
path_gdf = gpd.GeoDataFrame(path_df, geometry=path_df.geometry)
path_gdf.crs = {'init' :'epsg:4326'}

# GeoDataFrame showing starting point for each bird
start_df = birds.groupby("tag-local-identifier")['geometry'].apply(list).apply(lambda x: x[0]).reset_index()
start_gdf = gpd.GeoDataFrame(start_df, geometry=start_df.geometry)
start_gdf.crs = {'init' :'epsg:4326'}

# Show first five rows of GeoDataFrame
start_gdf.head()

Use the next code cell to create a GeoDataFrame `end_gdf` containing the final location of each bird.  
- The format should be identical to that of `start_gdf`, with two columns ("tag-local-identifier" and "geometry"), where the "geometry" column contains Point objects.
- Set the CRS of `end_gdf` to `{'init': 'epsg:4326'}`.

다음 코드 셀을 사용하여 각 새의 최종 위치를 포함하는 GeoDataFrame`end_gdf`를 작성합니다.
- 형식은 "start_gdf" 형식과 같아야 하며 2개의 열("태그로컬 식별자"와 "지오메트리")이 있으며 "지오메트리" 열에는 Point 객체가 포함되어 있습니다.
- 'end_gdf'의 CRS를 '{'init':'epsg:4326'}'로 설정합니다.

In [23]:
# Your code here
end_df = birds.groupby("tag-local-identifier")['geometry'].apply(list).apply(lambda x: x[-1]).reset_index()
end_gdf = gpd.GeoDataFrame(end_df, geometry=end_df.geometry)
end_gdf.crs = {'init': 'epsg:4326'}

# Check your answer
q_3.check()

In [22]:
# Lines below will give you a hint or solution code
q_3.hint()
q_3.solution()

### 4) Where does each bird start and end its journey? (Part 2)

Use the GeoDataFrames from the question above (`path_gdf`, `start_gdf`, and `end_gdf`) to visualize the paths of all birds on a single map.  You may also want to use the `americas` GeoDataFrame.

위의 질문('path_gdf', 'start_gdf', 'end_gdf')의 지오데이터 프레임을 사용하여 모든 새의 경로를 하나의 맵 상에서 시각화합니다. "americas" GeoDataFrame을 사용할 수도 있습니다.

In [35]:
# Your code here
ax = americas.plot(figsize=(8,8), color='whitesmoke', linestyle=':', edgecolor='black')
start_gdf.plot(ax=ax, color='red',  markersize=10)
path_gdf.plot(ax=ax, cmap='tab20b', linestyle='-', linewidth=1, zorder=1)
end_gdf.plot(ax=ax, color='black', markersize=10)

# Uncomment to see a hint
#q_4.hint()

In [32]:
# Get credit for your work after you have created a map
q_4.check()

# Uncomment to see our solution (your code may look different!)
q_4.solution()

### 5) Where are the protected areas in South America? (Part 1)

It looks like all of the birds end up somewhere in South America.  But are they going to protected areas?

In the next code cell, you'll create a GeoDataFrame `protected_areas` containing the locations of all of the protected areas in South America.  The corresponding shapefile is located at filepath `protected_filepath`.

모든 새는 남미 어딘가에 닿을 것 같아요. 하지만 보호지역으로 가는 건가요?

다음 코드셀에서는 남미의 모든 보호지역 위치를 포함하는 GeoDataFrame'protected_areas'를 작성합니다. 지원되는 셰이프 파일은 filepath`protected_filepath`에 있습니다.

In [38]:
# Path of the shapefile to load
protected_filepath = "../input/geospatial-learn-course-data/SAPA_Aug2019-shapefile/SAPA_Aug2019-shapefile/SAPA_Aug2019-shapefile-polygons.shp"

# Your code here
protected_areas = gpd.read_file(protected_filepath)

# Check your answer
q_5.check()

In [41]:
# Lines below will give you a hint or solution code
#q_5.hint()
#q_5.solution()

### 6) Where are the protected areas in South America? (Part 2)

Create a plot that uses the `protected_areas` GeoDataFrame to show the locations of the protected areas in South America.  (_You'll notice that some protected areas are on land, while others are in marine waters._)

protected_areas 지오데이터 프레임을 사용하여 남미 보호지역의 위치를 나타내는 플롯을 작성합니다. (_ 육지에 있는 보호 구역과 바다에 있는 보호 구역이 있습니다._)

In [46]:
# Country boundaries in South America
south_america = americas.loc[americas['continent']=='South America']

# Your code here: plot protected areas in South America
ax = south_america.plot(figsize=(8,8), color='whitesmoke', linestyle=':', edgecolor='black')
protected_areas.plot(ax=ax, color='red',  markersize=10)

# Uncomment to see a hint
#q_6.hint()

In [47]:
# Get credit for your work after you have created a map
q_6.check()

# Uncomment to see our solution (your code may look different!)
#q_6.solution()

### 7) What percentage of South America is protected?

You're interested in determining what percentage of South America is protected, so that you know how much of South America is suitable for the birds.  

As a first step, you calculate the total area of all protected lands in South America (not including marine area).  To do this, you use the "REP_AREA" and "REP_M_AREA" columns, which contain the total area and total marine area, respectively, in square kilometers.

Run the code cell below without changes.

남아메리카의 몇 퍼센트가 보호받고 있는지를 판단하는 데 관심이 있기 때문에 남아메리카의 어느 정도가 새에게 적합한지 알 수 있습니다

첫 단계로 남아메리카의 모든 보호지의 총면적을 계산합니다(해양면적 제외). 이를 수행하려면 REP_AREA 열과 REP_M_AREA 열을 사용합니다.이 열에는 각각 총 면적과 총 해양 면적이 평방 킬로미터 단위로 포함됩니다.

In [48]:
P_Area = sum(protected_areas['REP_AREA']-protected_areas['REP_M_AREA'])
print("South America has {} square kilometers of protected areas.".format(P_Area))

Then, to finish the calculation, you'll use the `south_america` GeoDataFrame.  

In [49]:
south_america.head()

Calculate the total area of South America by following these steps:
- Calculate the area of each country using the `area` attribute of each polygon (with EPSG 3035 as the CRS), and add up the results.  The calculated area will be in units of square meters.
- Convert your answer to have units of square kilometeters.

다음 절차에 따라 남미의 총면적을 계산합니다.
- 각국의 면적을 각 폴리곤의 '면적' 속성(EPSG3035를 CRS라 한다)을 이용해 산출하고, 그 결과를 합산한다. 계산된 면적은 제곱미터 단위입니다.
- 답을 제곱킬로미터 단위로 변환해 주십시오.

In [51]:
# Your code here: Calculate the total area of South America (in square kilometers)
totalArea = sum(south_america.geometry.to_crs(epsg=3035).area) / 10**6

# Check your answer
q_7.check()

In [53]:
# Lines below will give you a hint or solution code
q_7.hint()
q_7.solution()

Run the code cell below to calculate the percentage of South America that is protected.

다음 코드셀을 실행하여 보호받고 있는 남미의 비율을 계산합니다.

In [52]:
# What percentage of South America is protected?
percentage_protected = P_Area/totalArea
print('Approximately {}% of South America is protected.'.format(round(percentage_protected*100, 2)))

### 8) Where are the birds in South America?

So, are the birds in protected areas?  

Create a plot that shows for all birds, all of the locations where they were discovered in South America.  Also plot the locations of all protected areas in South America.

To exclude protected areas that are purely marine areas (with no land component), you can use the "MARINE" column (and plot only the rows in `protected_areas[protected_areas['MARINE']!='2']`, instead of every row in the `protected_areas` GeoDataFrame).

그럼 새는 보호지역에 있는 건가요?

남아메리카에서 발견된 모든 새의 위치를 나타내는 플롯을 작성합니다. 또한 남미의 모든 보호지역의 위치를 플롯합니다.

순수 해양 영역(토지 성분 없음)인 보호 영역을 제외하려면 MARINE 열을 사용합니다

In [56]:
# Your code here
ax = south_america.plot(figsize=(8,8), color='whitesmoke', linestyle=':', edgecolor='black')
protected_areas[protected_areas['MARINE']!='2'].plot(ax=ax, alpha=0.4, zorder=1)
birds[birds.geometry.y < 0].plot(ax=ax, color='red', alpha=0.6, markersize=10, zorder=2)

# Uncomment to see a hint
q_8.hint()

In [55]:
# Get credit for your work after you have created a map
q_8.check()

# Uncomment to see our solution (your code may look different!)
q_8.solution()

# Keep going

Create stunning **[interactive maps](https://www.kaggle.com/alexisbcook/interactive-maps)** with your geospatial data.

---




*Have questions or comments? Visit the [course discussion forum](https://www.kaggle.com/learn/geospatial-analysis/discussion) to chat with other learners.*