# 1 Introduction
## 1.1 What is GMNS?
General Modeling Network Specification (GMNS) is a flexible and unified multi-mode traffic network representation format. It is convenient for researchers to share and merge network data from different channels. 
## 1.2 What is Pandas?
Pandas is a data analysis package for Python. It is designed by AQR Capital Management in April,2008 and opened source at the end of 2009. Currently, Pydata, which focuses on Python data package development, continues developing and maintaining and is a part of pydata project.
In order to use pandas more effectively, we recomment using Anacaonda as your IDE. If you're done, you can execute the following command in Anaconda Powershell Prompt to view information about Pandas. Such as:
- view pandas version: input 'conda list' command, you can check if pandas is installed. If pandas is installed on your computer, you can get its version information in the console list.
- install pandas: If you have not installed pandas, please input the 'pip install pandas' command. Anaconda will automatically download and install the latest version of pandas.
- uninstall pandas: If you want to uninstall pandas, you should the 'pip uninstall pandas' command.

# 2 How to use Pandas in GMNS data processing?
In this section, some instances are listed to express the basic process of GMNS data processing with panda. The dataset from https://github.com/asu-trans-ai-lab/QGIS_NeXTA4GMNS/tree/master/datasets is used in our case.
## 2.1 Examples using node.csv

In [1]:
import pandas as pd  #import pandas package
print('Pandas version is ' + pd.__version__) #search the pandas version in python code

Pandas version is 1.2.3


In [2]:
df_node=pd.read_csv(r'F:\python\ACU\macronet\node.csv') # read node.csv file
nodeCnt = df_node.index #get the count of the node.csv
print('the number of nodes is {0}'.format(len(nodeCnt))) # output the number of nodes

the number of nodes is 219


## 2.2 Examples using link.csv

In [17]:
import pandas as pd #import pandas Package
df_link =pd.read_csv(r'F:\python\ACU\macronet\link.csv') # read link.csv file
print(df_link.columns) #get the field set in link.csv

Index(['name', 'link_id', 'from_node_id', 'to_node_id', 'facility_type',
       'dir_flag', 'length', 'lanes', 'capacity', 'free_speed', 'link_type',
       'cost', 'geometry', 'TMC'],
      dtype='object')


In [14]:
print(df_link.head(3)) #get the first three record in link.csv, and you can change the parameter based on your needs.

   name  link_id  from_node_id  to_node_id  facility_type  dir_flag  \
0   NaN        0             9         226            NaN       NaN   
1   NaN        1           226           9            NaN       NaN   
2   NaN        2           158         254            NaN       NaN   

       length  lanes  capacity  free_speed  link_type  cost  \
0  139.285820      1       NaN    -1.60934          1   NaN   
1  146.101478      1       NaN    -1.60934          1   NaN   
2  110.436568      1       NaN    40.23350          1   NaN   

                                            geometry  TMC  
0  LINESTRING (-111.93039 33.42312,-111.93038 33....  NaN  
1  LINESTRING (-111.9297 33.42413,-111.93013 33.4...  NaN  
2  LINESTRING (-111.94127 33.42431,-111.94071 33....  NaN  


In [5]:
# sort one column in link.csv and get the row that contains the maximum value of the column
lenSort = df_link.sort_values('length', ascending=False, kind='quicksort', ignore_index=False)
print(lenSort.head(1))

     name  link_id  from_node_id  to_node_id  facility_type  dir_flag  \
155   NaN      164           111         107            NaN       NaN   

         length  lanes  capacity  free_speed  link_type  cost  \
155  508.119779      1       NaN    -1.60934          1   NaN   

                                              geometry  TMC  
155  LINESTRING (-111.92628 33.42009,-111.92614 33....  NaN  


In [6]:
print(df_link['length'].max()) # get the length of the longest link in link.csv 

508.11977870229583


In [9]:
print(df_link.loc[1]) # get the content in the first row

name                                                           NaN
link_id                                                          1
from_node_id                                                   226
to_node_id                                                       9
facility_type                                                  NaN
dir_flag                                                       NaN
length                                                  146.101478
lanes                                                            1
capacity                                                       NaN
free_speed                                                -1.60934
link_type                                                        1
cost                                                           NaN
geometry         LINESTRING (-111.9297 33.42413,-111.93013 33.4...
TMC                                                            NaN
Name: 1, dtype: object


In [10]:
print(df_link[df_link['lanes']==1]) #get the links with 1 lane in link.csv

     name  link_id  from_node_id  to_node_id  facility_type  dir_flag  \
0     NaN        0             9         226            NaN       NaN   
1     NaN        1           226           9            NaN       NaN   
2     NaN        2           158         254            NaN       NaN   
3     NaN        3           254         158            NaN       NaN   
4     NaN        4            13          86            NaN       NaN   
..    ...      ...           ...         ...            ...       ...   
503   NaN      540            67          66            NaN       NaN   
504   NaN      541           227         226            NaN       NaN   
505   NaN      542           226         227            NaN       NaN   
506   NaN      543           234          28            NaN       NaN   
507   NaN      544           236         238            NaN       NaN   

         length  lanes  capacity  free_speed  link_type  cost  \
0    139.285820      1       NaN    -1.60934          1   

## 2.3 Examples using zone.csv

In [3]:
import pandas as pd #import pandas package
df_zone=pd.read_csv(r'F:\python\ACU\macronet\zone.csv') #read zone.csv file
print('the total number of zones is {0}'.format(len(df_zone.index))) #get the total number of zones

the total number of zones is 150


In [4]:
typeA = df_zone.loc[df_zone['name'].str.contains('A')] #select the zones that start with the letter A
print(typeA) # print the row about zones that start with the letter A

   zone_id      name                                           geometry
0        1  A2 - 161  POLYGON ((-111.945007 33.427002, -111.944008 3...
1        2    A3 - 1  POLYGON ((-111.945007 33.426003, -111.944008 3...
2        6    A7 - 2  POLYGON ((-111.945007 33.422001, -111.944008 3...
3        7    A8 - 3  POLYGON ((-111.945007 33.421001, -111.944008 3...
4        9   A10 - 4  POLYGON ((-111.945007 33.419003, -111.944008 3...
5       13   A14 - 5  POLYGON ((-111.945007 33.415001, -111.944008 3...
6       16   A17 - 7  POLYGON ((-111.945007 33.412003, -111.944008 3...
7       18   A19 - 8  POLYGON ((-111.945007 33.410000, -111.944008 3...


In [13]:
print('the count of fields in zone.csv is {0}'.format(len(typeA.columns)))

the count of fields in zone.csv is 3


In [5]:
print('the count of the A zone is {0}'.format(len(typeA.index))) #get the count of the A zone

the count of the A zone is 8


In [None]:
typeA.to_csv(r'F:\python\ACU\macronet\typeA.csv') #export the rows about A zone to csv file

# 2.4 Examples using online resources
You can also call these resources in this artical online. Now, we will give some useful examples. 
Firstly, when you get the online resources, some steps will be executed to deal with your Github URL.

CASE ONE
You should insert 'raw.' into your URL. For example: you should modify https://githubusercontent.com/cs109/2014_data/master/countries.csv
to https://raw.githubusercontent.com/cs109/2014_data/master/countries.csv

CASE TWO:
You should remove 'blob' and replace 'github' by 'raw.githubsercontent'. For example: you should modify https://github.com/asu-trans-ai-lab/QGIS_NeXTA4GMNS/blob/master/datasets/ASU/macronet/zone.csv
to https://raw.githubusercontent.com/asu-trans-ai-lab/QGIS_NeXTA4GMNS/master/datasets/ASU/macronet/zone.csv

Notes: In this way, you should keep the network information through promptly. Otherwise, you may get back some error messages, such as '404: Not found' or 'Connection aborted' etc.

In [5]:
import pandas as pd
import io
import requests
url='https://raw.githubusercontent.com/asu-trans-ai-lab/QGIS_NeXTA4GMNS/master/datasets/ASU/macronet/zone.csv'
# https://raw.githubusercontent.com/cs109/2014_data/master/countries.csv
# https://raw.githubusercontent.com/asu-trans-ai-lab/QGIS_NeXTA4GMNS/master/datasets/ASU/macronet/zone.csv
s=requests.get(url).content
pd_url=pd.read_csv(io.StringIO(s.decode('utf-8')))
pd_url.head()

Unnamed: 0,zone_id,name,geometry
0,1,A2 - 161,"POLYGON ((-111.945007 33.427002, -111.944008 3..."
1,2,A3 - 1,"POLYGON ((-111.945007 33.426003, -111.944008 3..."
2,6,A7 - 2,"POLYGON ((-111.945007 33.422001, -111.944008 3..."
3,7,A8 - 3,"POLYGON ((-111.945007 33.421001, -111.944008 3..."
4,9,A10 - 4,"POLYGON ((-111.945007 33.419003, -111.944008 3..."


# 3 References
- The information for GMNS data format： https://github.com/zephyr-data-specs/GMNS
- The official manual for Pandas：https://pandas.pydata.org/pandas-docs/version/0.15/tutorials.html
- The test dataset for the artical: https://github.com/asu-trans-ai-lab/QGIS_NeXTA4GMNS/tree/master/datasets
- How to read CSV file from GitHub using pandas: https://stackoverflow.com/questions/55240330/how-to-read-csv-file-from-github-using-pandas