# Creating Spatial Data

A common operation in spatial analysis is to take non-spatial data, such as CSV files, and creating a spatial dataset from it using coordinate information contained in the file. GeoPandas provides a convenient way to take data from a delimited-text file, create geometry and write the results as a spatial dataset.

We will read a tab-delimited file of places, filter it to a feature class, create a GeoDataFrame and export it as a GeoPackage file.

![](https://github.com/spatialthoughts/python-foundation-web/blob/master/images/python_foundation/geonames_mountains.png?raw=1)

## Downloading Data

For this exercise we will need multiple text files, which are available to download as a zip file. 

In [None]:
# 1. Check/Install onedrivedownloader
try:
    import onedrivedownloader
except ImportError:
    print("Installing onedrivedownloader...")
    %pip install onedrivedownloader
    import onedrivedownloader
print("âœ“ Module onedrivedownloader ready.")

# 2. Define the OneDrive share link to geonames.zip, download, and extract
from onedrivedownloader import download
link = "https://etsu365-my.sharepoint.com/:u:/g/personal/ernenwei_etsu_edu/IQCjzFVUd6TrQZ3yLLyVRKQGATcvorLx8Jbc7VqNT5q0RTc?e=caFP3T"
download(link, filename="geonames.zip", unzip=True, unzip_path="./data/geonames")

In [None]:
# We import modules and set up the path to US.txt
import os
import pandas as pd
import geopandas as gpd

folder_path = 'data/geonames'
filename = 'US.txt'
path = os.path.join(folder_path, filename)

path

## Reading Tab-Delimited Files

The source data comes from [GeoNames](https://en.wikipedia.org/wiki/GeoNames) - a free and open database of geographic names of the world. It is a huge database containing millions of records per country. The data is distributed as country-level text files in a tab-delimited format. The files do not contain a header row with column names, so we need to specify them when reading the data. The data format is described in detail on the [Data Export](https://www.geonames.org/export/) page.

We specify the separator as **\\t** (tab) as an argument to the `read_csv()` method. Note that the file for USA has more than 2 million records.

In [None]:
column_names = [
    'geonameid', 'name', 'asciiname', 'alternatenames',
    'latitude', 'longitude', 'feature class', 'feature code',
    'country code', 'cc2', 'admin1 code', 'admin2 code',
    'admin3 code', 'admin4 code', 'population', 'elevation',
    'dem', 'timezone', 'modification date'
]

df = pd.read_csv(path, sep='\t', names=column_names)
df.info()

## Filtering Data

The input data has a column `feature_class` that categorizes places into [9 feature classes](https://www.geonames.org/export/codes.html). We can select all rows in the `T` feature class, which is described as  *mountain,hill,rock...*

In [None]:
mountains = df[df['feature class']=='T']
mountains.head()[['name', 'latitude', 'longitude', 'dem','feature class']]

## Creating Geometries

GeoPandas has a conveinent function `points_from_xy()` that creates a Geometry column from X and Y coordinates. We can then take a Pandas dataframe and create a GeoDataFrame by specifying a CRS and the geometry column.

In [None]:
geometry = gpd.points_from_xy(mountains.longitude, mountains.latitude)
gdf = gpd.GeoDataFrame(mountains, crs='EPSG:4326', geometry=geometry)
gdf

## Writing Files

We can write the resulting GeoDataFrame to any of the supported vector data formats. The format is inferred from the file extension. Use `.shp` if you want to save the results as a shapefile. Here we are writing it as a new GeoPackage file so we use the `.gpkg` extension.

You can open the resulting geopackage in a GIS and view the data.

In [None]:
output_dir = 'output'
output_filename = 'mountains.gpkg'
output_path = os.path.join(output_dir, output_filename)
os.makedirs(output_dir, exist_ok=True)

gdf.to_file(filename=output_path, encoding='utf-8')
print('Successfully written output file at {}'.format(output_path))

## Exercise

The GeoNames data package contains multiple text files from different countries in the `geonames/` folder. We have the code below that reads all the files, extracts the mountain features, and merges them into a single DataFrame using the `pd.concat()` function.

The exercise is to convert the merged DataFrame to a GeoDataFrame and save it as a shapefile.

Uncomment the code to mount your google drive, if necessary

DtypeWarning, UserWarning, and RuntimeWarning message ok ok. You can get rid of the first one by modifying pd.read_csv call, if you would like. The other two are due to limitations in the shapefile format and cannot be fixed short of using a different file format.

In [None]:
import os
import pandas as pd
import geopandas as gpd

# Define the path to the data package
data_pkg_path = 'data/geonames/'
files = os.listdir(data_pkg_path)

# Define column names as the files do not have headers
column_names = [
    'geonameid', 'name', 'asciiname', 'alternatenames',
    'latitude', 'longitude', 'feature class', 'feature code',
    'country code', 'cc2', 'admin1 code', 'admin2 code',
    'admin3 code', 'admin4 code', 'population', 'elevation',
    'dem', 'timezone', 'modification date'
]

dataframes = []
# Loop through each file, read it, filter for mountains, and append to list
for file in files:
    path = os.path.join(data_pkg_path, file)
    df = pd.read_csv(path, sep='\t', names=column_names, low_memory=False)
    mountains = df[df['feature class']=='T'] # Filter for 'T' feature class (mountains)
    dataframes.append(mountains)

# Concatenate all mountain DataFrames into a single DataFrame
merged = pd.concat(dataframes)







----