A common operation in spatial analysis is to take non-spatial data, such as CSV files, and creating a spatial dataset from it using coordinate information contained in the file. Geopandas provides a convenient wat to take data from a delimited-text file, create geometry and write the results as a spatial dataset.


We will read a tab-delimited files of places, filter it to a feature class, create a GeoDataFrame and export it as a GeoPackage file.

In [26]:
import os
import pandas as pd
import geopandas as gpd

In [27]:
data_pkg_path = 'data/geonames/'
filename = 'US.txt'
path = os.path.join(data_pkg_path, filename)

In [28]:
path

'data/geonames/US.txt'

**Reading Tab-Delimited Files**
The source data comes from GeoNames - A dree and ipen database of geographic names of the world. It is a huge database containing millions of records per country. The data is distributed as country-level text file in a tab-delimited format. The filrs do not contain a header row with column names, so we need to specify them when reading the data. The data format is described in detail on the Data Export page.

We specify the separtor as \t (tab) as an argument to read read_csv() method. Note that the file for USA has more than 2M records.

In [29]:
column_names = [
    'geonameid', 'name', 'asciiname', 'alternatenames', 'latitude', 'longitude', 'feature class', 'feature code', 'country code', 'cc2', 'admin1 code', 'admin2 code', 'admin3 code', 'admin4 code', 'population', 'elevation', 'dem', 'timezone', 'modification date'
]

df = pd.read_csv(path, sep='\t', names=column_names)

  df = pd.read_csv(path, sep='\t', names=column_names)


In [30]:
filtered = df[df['feature class'] == 'T']
# ['name', 'latitude', 'longitude', 'dem', 'feature class']

filtered[['name', 'latitude', 'longitude', 'dem', 'feature class']]

Unnamed: 0,name,latitude,longitude,dem,feature class
19,Vulcan Point,52.10222,177.53889,-9999,T
20,Tropical Ridge,51.99167,177.50833,267,T
21,Thirty-Seven Hill,52.84528,173.15278,193,T
24,Square Point,52.86120,173.33679,30,T
25,Square Bluff,51.65000,178.70000,-9999,T
...,...,...,...,...,...
2241317,ʻŌnūnui,24.99906,-167.99891,-9999,T
2241318,ʻŌnūiki,25.00004,-168.00037,-9999,T
2241320,Flagpole Hill,23.57550,-164.70266,-9999,T
2241321,Bowl Hill,23.57512,-164.69836,-9999,T


In [31]:
filtered.head()

Unnamed: 0,geonameid,name,asciiname,alternatenames,latitude,longitude,feature class,feature code,country code,cc2,admin1 code,admin2 code,admin3 code,admin4 code,population,elevation,dem,timezone,modification date
19,4045410,Vulcan Point,Vulcan Point,"Volcan Point,Vulcan Point",52.10222,177.53889,T,CAPE,US,,AK,16.0,,,0,,-9999,America/Adak,2014-10-08
20,4045411,Tropical Ridge,Tropical Ridge,,51.99167,177.50833,T,RDGE,US,,AK,16.0,,,0,,267,America/Adak,2010-01-30
21,4045412,Thirty-Seven Hill,Thirty-Seven Hill,,52.84528,173.15278,T,MT,US,,AK,16.0,,,0,,193,America/Adak,2014-10-08
24,4045415,Square Point,Square Point,,52.8612,173.33679,T,CAPE,US,,AK,16.0,,,0,,30,America/Adak,2014-11-17
25,4045416,Square Bluff,Square Bluff,,51.65,178.7,T,CLF,US,,AK,16.0,,,0,,-9999,America/Adak,2014-10-08


**Filtering Data**

The input data as a feature_class categorizing the place into 9 feature classes. We can select all rows with the value T with the category mountain, hill, rock...

In [32]:
mountains = df[df['feature class']=='T']
print(mountains.head()[['name', 'latitude', 'longitude', 'dem', 'feature class']])

                 name  latitude  longitude   dem feature class
19       Vulcan Point  52.10222  177.53889 -9999             T
20     Tropical Ridge  51.99167  177.50833   267             T
21  Thirty-Seven Hill  52.84528  173.15278   193             T
24       Square Point  52.86120  173.33679    30             T
25       Square Bluff  51.65000  178.70000 -9999             T


In [33]:
type(mountains)

pandas.core.frame.DataFrame

**Creating Geometry**

GeoPandas has a convenient function *points_from_xy()* that creates a Geometry column from X and Y coordinates. We can then take a Pandas dataframe and create a GeoDataFrame by specifying a CRS and the geometry column.

In [34]:
geometry = gpd.points_from_xy(mountains.longitude, mountains.latitude)
gdf = gpd.GeoDataFrame(mountains, crs='EPSG:4326', geometry=geometry)
print(gdf.info())

<class 'geopandas.geodataframe.GeoDataFrame'>
Index: 225523 entries, 19 to 2241322
Data columns (total 20 columns):
 #   Column             Non-Null Count   Dtype   
---  ------             --------------   -----   
 0   geonameid          225523 non-null  int64   
 1   name               225523 non-null  object  
 2   asciiname          225523 non-null  object  
 3   alternatenames     34087 non-null   object  
 4   latitude           225523 non-null  float64 
 5   longitude          225523 non-null  float64 
 6   feature class      225523 non-null  object  
 7   feature code       225523 non-null  object  
 8   country code       225523 non-null  object  
 9   cc2                5 non-null       object  
 10  admin1 code        225520 non-null  object  
 11  admin2 code        225344 non-null  float64 
 12  admin3 code        21882 non-null   float64 
 13  admin4 code        0 non-null       float64 
 14  population         225523 non-null  int64   
 15  elevation          224363 non

In [35]:
geometry

<GeometryArray>
[ <POINT (177.539 52.102)>,  <POINT (177.508 51.992)>,
  <POINT (173.153 52.845)>,  <POINT (173.337 52.861)>,
     <POINT (178.7 51.65)>,  <POINT (179.728 51.922)>,
  <POINT (173.104 52.798)>,  <POINT (172.911 52.976)>,
  <POINT (173.218 52.882)>,  <POINT (177.371 51.975)>,
 ...
 <POINT (-111.742 40.657)>,  <POINT (-76.659 34.687)>,
   <POINT (-118.67 37.45)>, <POINT (-115.148 36.196)>,
 <POINT (-121.902 37.901)>, <POINT (-167.999 24.999)>,
         <POINT (-168 25)>, <POINT (-164.703 23.576)>,
 <POINT (-164.698 23.575)>, <POINT (-164.696 23.575)>]
Length: 225523, dtype: geometry

In [36]:
gdf

Unnamed: 0,geonameid,name,asciiname,alternatenames,latitude,longitude,feature class,feature code,country code,cc2,admin1 code,admin2 code,admin3 code,admin4 code,population,elevation,dem,timezone,modification date,geometry
19,4045410,Vulcan Point,Vulcan Point,"Volcan Point,Vulcan Point",52.10222,177.53889,T,CAPE,US,,AK,16.0,,,0,,-9999,America/Adak,2014-10-08,POINT (177.53889 52.10222)
20,4045411,Tropical Ridge,Tropical Ridge,,51.99167,177.50833,T,RDGE,US,,AK,16.0,,,0,,267,America/Adak,2010-01-30,POINT (177.50833 51.99167)
21,4045412,Thirty-Seven Hill,Thirty-Seven Hill,,52.84528,173.15278,T,MT,US,,AK,16.0,,,0,,193,America/Adak,2014-10-08,POINT (173.15278 52.84528)
24,4045415,Square Point,Square Point,,52.86120,173.33679,T,CAPE,US,,AK,16.0,,,0,,30,America/Adak,2014-11-17,POINT (173.33679 52.8612)
25,4045416,Square Bluff,Square Bluff,,51.65000,178.70000,T,CLF,US,,AK,16.0,,,0,,-9999,America/Adak,2014-10-08,POINT (178.7 51.65)
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
2241317,13535685,ʻŌnūnui,'Onunui,,24.99906,-167.99891,T,ISL,US,,HI,,,,0,,-9999,,2025-10-28,POINT (-167.99891 24.99906)
2241318,13535686,ʻŌnūiki,'Onuiki,,25.00004,-168.00037,T,ISL,US,,HI,,,,0,,-9999,,2025-10-28,POINT (-168.00037 25.00004)
2241320,13535711,Flagpole Hill,Flagpole Hill,,23.57550,-164.70266,T,HLL,US,,HI,3.0,,,0,,-9999,,2025-10-31,POINT (-164.70266 23.5755)
2241321,13535712,Bowl Hill,Bowl Hill,Bryan Peak,23.57512,-164.69836,T,HLL,US,,HI,3.0,,,0,,-9999,,2025-10-31,POINT (-164.69836 23.57512)


**Writing Files**

We can write the resulting GeoDataFrame to any of the supported vector data format. Here we are writing it as a new GeoPackage file. 

You can open the resulting geopackage in a GIS and view the data.

In [53]:
output_dir = 'output'
output_filename = 'mountains.shp'
output_path = os.path.join(output_dir, output_filename)
output_path

'output\\mountains.shp'

In [54]:
gdf.to_file(filename=output_path, encoding='utf-8')
print('Successfully written output file at {}'.format(output_path))

  gdf.to_file(filename=output_path, encoding='utf-8')
  ogr_write(
  ogr_write(
  ogr_write(
  ogr_write(
  ogr_write(
  ogr_write(
  ogr_write(
  ogr_write(
  ogr_write(
  ogr_write(


Successfully written output file at output\mountains.shp
