## Data Types in GIS - Essentials of GIS Chapter 5 

#### Run this cell to connect to your GIS and get started:

In [7]:
from arcgis.gis import GIS
import pandas as pd
gis = GIS("home")

#### 1. Introduction

In programming and GIS, data types are essential because they define the type of information a variable can hold. Understanding data types helps ensure data consistency, accuracy, and allows for specific operations on the data. Let's delve into some of the basic data types in Python, the language of GIS software.

#### 2. String data type

**Description:** A string is a sequence of characters. In Python, strings are enclosed within single (' ') or double (" ") quotes. They can contain letters, numbers, and special characters.

In [8]:
city_name = "San Francisco"
print("City Name:", city_name)
print("Data Type:", type(city_name))

City Name: San Francisco
Data Type: <class 'str'>


#### 3. Floating-point (float) data type 

**Description:** Floating-point numbers (often just called "floats") represent real numbers and are written with a decimal point. They can be used to represent fractional values.

In [11]:
latitude = 37.7749
print("Latitude:", latitude)
print("Data Type:", type(latitude))

Latitude: 37.7749
Data Type: <class 'float'>


#### 4. Integer data type

**Description:** An integer is a whole number, without any fractional part. It can be positive, negative, or zero.

In [12]:
population = 883305
print("Population:", population)
print("Data Type:", type(population))

Population: 883305
Data Type: <class 'int'>


#### 5. Boolean data type

**Description:** The boolean data type can have one of two values, True or False. It's often used to represent the truth value of an expression.

In [13]:
is_coastal = True
print("Is the city coastal?", is_coastal)
print("Data Type:", type(is_coastal))

Is the city coastal? True
Data Type: <class 'bool'>


#### 6. Date data type

**Description:** The date data type represents a specific day, month, and year. In Python, the datetime module provides the date class to handle date values. Dates are essential in GIS for time-series analyses, tracking changes over time, or understanding temporal patterns.

In [17]:
from datetime import date

established_date = date(1850, 4, 15) # Representing April 15, 1850
print("Established Date:", established_date) # This line of code prints the date I specified in the above variable "established_date"
print("Data Type:", type(established_date))

Established Date: 1850-04-15
Data Type: <class 'datetime.date'>


**Date Representation in Spreadsheet Programs (e.g., Excel)**

Spreadsheet programs, like Excel, have a unique approach to storing and representing dates. At its core, Excel represents dates as serial numbers:

Serial Numbers: In Excel, dates are stored as sequential serial numbers so that they can be used in calculations. Essentially, each date is an integer value representing the number of days since a starting date, known as the "epoch" or "base date."

For Excel (in its default configuration), the base date is January 1, 1900. This means that January 1, 1900, is serial number 1, January 2, 1900, is serial number 2, and so on.

This system allows for easy arithmetic with dates. For example, subtracting two date serial numbers gives the number of days between those two dates.
Display Formats: While the underlying value of a date in Excel is a serial number, Excel can display the date in various formats, such as "MM/DD/YYYY" or "DD-MM-YYYY". This display format does not change the actual stored value; it merely affects how the date appears in the spreadsheet.

Time Representation: Excel also represents time using fractions of a day. For instance, 0.5 represents noon since it's halfway through the day. So, the serial number 1.5 would correspond to noon on January 1, 1900.

Leap Years: Excel considers 1900 as a leap year for compatibility reasons with other spreadsheet programs, even though it wasn't a leap year. This means that February 29, 1900, is represented in Excel, although it didn't exist in reality.

Limitations & Considerations: Due to this serial number system, Excel has limits on the range of dates it can represent. Typically, Excel can represent dates from January 1, 1900, through December 31, 9999. However, it's essential to be cautious when performing date arithmetic in Excel, especially when working with historical datasets that might include dates earlier than 1900.

This method of date representation is common among spreadsheet programs because it allows for straightforward date calculations and flexibility in date display formats. However, GIS professionals and other data analysts need to be aware of this system's intricacies when importing dates from Excel into GIS or other software platforms.

#### Hey, remember attribute measurement scales?

In [1]:
import ipywidgets as widgets
from IPython.display import display, clear_output

# Create dropdown widgets for matching
string_widget = widgets.Dropdown(
    options=['', 'Nominal', 'Ordinal', 'Interval and Ratio', 'Cyclic'],
    value='',
    description='String:'
)

integer_widget = widgets.Dropdown(
    options=['', 'Nominal', 'Ordinal', 'Interval and Ratio', 'Cyclic'],
    value='',
    description='Integer:'
)

float_widget = widgets.Dropdown(
    options=['', 'Nominal', 'Ordinal', 'Interval and Ratio', 'Cyclic'],
    value='',
    description='Float:'
)

boolean_widget = widgets.Dropdown(
    options=['', 'Nominal', 'Ordinal', 'Interval and Ratio', 'Cyclic'],
    value='',
    description='Boolean:'
)

date_widget = widgets.Dropdown(
    options=['', 'Nominal', 'Ordinal', 'Interval and Ratio', 'Cyclic'],
    value='',
    description='Date'
)

# Button to check answers
check_button = widgets.Button(description="Check Answers")

# Output widget to display result
output = widgets.Output()

# Logic to check answers
def check_answers(button):
    with output:
        clear_output()
        if (string_widget.value == "Nominal" and 
            integer_widget.value == "Interval and Ratio" and 
            float_widget.value == "Interval and Ratio" and 
            boolean_widget.value == "Nominal" and 
            date_widget.value == "Cyclic"):
            print("Correct! Well done!")
        elif (string_widget.value == "Nominal" and 
            integer_widget.value == "Ordinal" and 
            float_widget.value == "Interval and Ratio" and 
            boolean_widget.value == "Nominal" and 
            date_widget.value == "Cyclic"):
            print("Correct! Well done!")
        else:
            print("Try again. Review the matches and make sure they align with the explanations provided.")

check_button.on_click(check_answers)

# Display the widgets
display(string_widget, integer_widget, float_widget, boolean_widget, date_widget, check_button, output)

Dropdown(description='String:', options=('', 'Nominal', 'Ordinal', 'Interval and Ratio', 'Cyclic'), value='')

Dropdown(description='Integer:', options=('', 'Nominal', 'Ordinal', 'Interval and Ratio', 'Cyclic'), value='')

Dropdown(description='Float:', options=('', 'Nominal', 'Ordinal', 'Interval and Ratio', 'Cyclic'), value='')

Dropdown(description='Boolean:', options=('', 'Nominal', 'Ordinal', 'Interval and Ratio', 'Cyclic'), value='')

Dropdown(description='Date', options=('', 'Nominal', 'Ordinal', 'Interval and Ratio', 'Cyclic'), value='')

Button(description='Check Answers', style=ButtonStyle())

Output()

#### Now, lets work with some data

In [10]:
# Creating a DataFrame with cities, their latitudes, longitudes, populations, and coastal information

import pandas as pd

cities_data = {
    "City Name": ["San Francisco", "Los Angeles", "New York", "Chicago", "Houston", "Phoenix", "Philadelphia"],
    "Latitude": [37.7749, 34.0522, 40.7128, 41.8781, 29.7604, 33.4484, 39.9526],
    "Longitude": [-122.4194, -118.2437, -74.0060, -87.6298, -95.3698, -112.0740, -75.1652],
    "Population": [883305, 3792621, 8175133, 2695598, 2129784, 1660272, 1526006],
    "Is Coastal": [True, True, True, False, False, False, True]
}

cities_df = pd.DataFrame(cities_data)
cities_df

Unnamed: 0,City Name,Latitude,Longitude,Population,Is Coastal
0,San Francisco,37.7749,-122.4194,883305,True
1,Los Angeles,34.0522,-118.2437,3792621,True
2,New York,40.7128,-74.006,8175133,True
3,Chicago,41.8781,-87.6298,2695598,False
4,Houston,29.7604,-95.3698,2129784,False
5,Phoenix,33.4484,-112.074,1660272,False
6,Philadelphia,39.9526,-75.1652,1526006,True


In [21]:
cities_df.to_csv("cities.csv", index=False)

### Now, create a map in ArcGIS Online notebooks and add these data as points

In [23]:
example_map = gis.map("USA")

In [24]:
example_map

MapView(layout=Layout(height='400px', width='100%'))

In [25]:
# Convert your example data from a dataframe to a spatial dataframe (with formatted SHAPE column)

sdf = pd.DataFrame.spatial.from_xy(cities_df, "X", "Y", sr=4326)

In [26]:
sdf.head()

Unnamed: 0,City Name,Latitude,Longitude,Population,Is Coastal,SHAPE
0,San Francisco,37.7749,-122.4194,883305,True,"{""spatialReference"": {""wkid"": 4326}, ""x"": -122..."
1,Los Angeles,34.0522,-118.2437,3792621,True,"{""spatialReference"": {""wkid"": 4326}, ""x"": -118..."
2,New York,40.7128,-74.006,8175133,True,"{""spatialReference"": {""wkid"": 4326}, ""x"": -74...."
3,Chicago,41.8781,-87.6298,2695598,False,"{""spatialReference"": {""wkid"": 4326}, ""x"": -87...."
4,Houston,29.7604,-95.3698,2129784,False,"{""spatialReference"": {""wkid"": 4326}, ""x"": -95...."


In [27]:
# Convert spatial dataframe to feature layer for mapping in ArcGIS Online

fl = sdf.spatial.to_featurelayer()

In [28]:
example_map.add_layer(fl)

In [29]:
example_map

MapView(jupyter_target='notebook', layout=Layout(height='400px', width='100%'), ready=True)