<h1><b> DSC 170: Introduction to GIS, ArcGIS, ArcGIS Online, and ArcGIS API for Python (ArcGIS-1)</b></h1>

This lecture will cover:
* GIS definitions and key points    
    - in a narrow sense: integrating data based on relationships in space
* Intro to ArcGIS and ArcGIS Online
* ArcGIS API for Python: main components
* How different components of ArcGIS ecosystem work together
    * interactions between ArcGIS API for Python and ArcGIS Online
    * ArcGIS API for Python, and ArcPy
* Beginning vector data analysis with ArcGIS API
* A typical data science setup: data enrichment

Let's review where we started: https://docs.google.com/presentation/d/1qzzeryYdUM_DMgjGBUlKi2J-HguXbR9dJ6HBmbiMdws/edit#slide=id.p24

## GIS has been defined in several ways:
* naively: software acquired to make hardware manage spatial databases (maps)
* an integrated package for the input, storage, analysis and output of spatial information (just the computer component...)
* “a system of hardware, software, data, people, organizations and institutional arrangements for collecting, storing, analyzing, and disseminating information about areas of the earth (Dueker and Kjerne, 1989)  
   - "people" and "organizations" is important
* D. Cowen’s : a spatial decision support system (SDSS)
* "A geographic information system (GIS) is a framework for gathering, managing, and analyzing data." (https://www.esri.com/en-us/what-is-gis/overview) ??
* people still write about it... https://www.gislounge.com/what-is-gis/

### Key points to understand GIS:

* Includes: hardware, software, data, people, organizations, institutional arrangements
* An __Information System__ applied to geographic data, geography (spatial relations) is used to integrate information
* Maps in GIS are graphic representations of a __digital database__ (what is the difference between a GIS and an online mapping system??)
* Combination of attribute and spatial data, with __dynamic linkage__ between them
* Implements spatial and attribute queries, selections, variety of analytical procedures 
* Ability to __integrate__ data from a variety of sources/formats - concept of map layers: 



## ArcGIS product line 

... is developed by ESRI (http://www.esri.com). There are many products these days (https://www.esri.com/en-us/arcgis/products/index), including core enterprise components, for the desktop (ArcGIS Desktop, ArcGIS Pro), servers, mobile, etc. Also, a number of domains-specific solutions, focused on oil and gas, governments at different levels, businesses, defense, maritime, aviation, utilities, transportation, community development, etc. There are also several development producs, and we will use several of them, primarily ArcGIS API for Python. 



## ArcGIS API for Python, and interaction with ArcGIS Online  


ArcGIS API for Python includes several modules (from https://developers.arcgis.com/python/guide/overview-of-the-arcgis-api-for-python/)
<img src='img/guide_api_modules_overview.png' >

* GIS (entry point: management of users and content, access datasets)
* in purple: cover different types of spatial data (today we'll work with features; later with rasters)
* in blue: general operations used across types: geoprocessing, geoenrichment, geocoding, geometry management, etc.
* in orange: visualization modules and web apps management

We'll explore several modules in this class, but won't cover them exhaustively - there is documentation online (https://developers.arcgis.com/python/). We will only go into details where it is conceptually important. 

### The GIS, and Content Manager.

ArcGIS Python API works with data of different types via __Content Manager__. 

__Content manager__ allows users to manage the data and maps that they have stored in their ArcGIS account or organization. It allows users to add, edit, and delete data and maps, as well as share them with other users within their organization or with the public. 


In [None]:
import warnings

import arcgis
from arcgis.gis import GIS
from arcgis import geometry
from arcgis.features import GeoAccessor, GeoSeriesAccessor
import pandas as pd
import os

print(arcgis.__version__)
print(pd.__version__)



In [None]:
# Log in to GIS portal

# gis = GIS() # for anonymous access

# this way to login is using UCSD Single-Sign-On accounts. See separate doc on how to create a client_id
gis=GIS("https://ucsdonline.maps.arcgis.com/home", client_id="bZshlNXFuaR2KHff") 

# my ESRI owner id for the SSO account is izaslavsky_UCSDOnline - yours will look similar (that is, your SSOID_UCSDOnline)

# my old style account: you may still see it in demo notebooks. 
# gis = GIS(username='izaslavsky_ucsd')  # this will ask for password. You can also include your password in this string




In [None]:
my_content = gis.content.search(query=f"owner:izaslavsky_UCSDOnline", max_items=1000)
my_content

### Using more than one GIS object

Important: In the above, I redefined the gis object. 
But you may have more than one GIS object in your code. 

For example, you could use agol = GIS(...) and age = GIS(...), and then use specific functions that are available at these GIS objects.

(AGE == ArcGIS Enterprise)

__ArcGIS Online__ is a cloud-based mapping and analysis platform developed by Esri. It is a web-based application that allows users to create and share maps, data, and geographic information. ArcGIS Online includes a wide range of tools and resources for creating, managing, and sharing spatial data and maps, including tools for mapping, analysis, data visualization, and data management. It also includes a large collection of pre-made maps, data layers, and applications that can be used as a starting point for creating new maps and analyses.

__ArcGIS Enterprise__ is a comprehensive mapping and analysis platform by Esri, deployed on-premises or in a private cloud. Using it doesn't require the use of credits (with a few exceptions).


In [None]:
# once we established a GIS object, we can search for data of different types available through that object:
search_result = gis.content.search(query="title:San Diego", item_type="Feature Layer")


# search_result = gis.content.search(query="title:San Diego", item_type="Feature Layer", outside_org=True)
# important to add outside_org when you are logged in!!

# see search reference at https://developers.arcgis.com/rest/users-groups-and-items/search-reference.htm
# fields: id, owner, created (in UNIX time), modified, title, type, typekeywords, description, tags, snippet
# spatialreference, access, group, numratings, avgrating, ...

# search also supports wildcards
search_result

In [None]:
# we can addres individual search results:

for i, g in enumerate(search_result):
    print(i,g)

### Exploring ArcGIS Online

These same resources are available through ArcGIS Online. Login to arcgis.com using your username and password. Compare the Python's Content Manager with  _Content_ in ArcGIS Online. Other menu items in ArcGIS Online can be also accessed programmatically from the Python API. 

### What types of data are managed in ArcGIS (and we can request through the content manager) 

See https://developers.arcgis.com/rest/users-groups-and-items/items-and-item-types.htm

Too many specific types to desscribe (See the table on the web page). The main groups are:
* maps (consist of one or more layers)
* layers (information feeds)
* styles (symbols, colors, north arrows, etc.)
* tools (extract additional info/do processing; processing services)
* applications (maps + widgets/tools)
* datafiles (content)
* notebooks (!)
* deep learning packages, .dlpk (!!)

Each item (on the web) has a URL, also title, description, type, keywords, thumbnail, etc. Some of these metadata elements are mandatory, your resources won't be published without them. This requirement derives from a long experience of dealing with spatial data without adequate metadata descriptions.

Compare it with how you manage data with pandas/geopandas: you have to do it yourself or use 3-rd party workflow and metadata tools. Now, you can keep all your data online, publish data to your account manually and from Python, share it within your group or organization or the world, and also retrieve the data into your Python code from there.

In this class: we have a DSC170 Data group set up. You are all members of this group (or will be soon) 

In [None]:
from IPython.display import display
for i, g in enumerate(search_result):
    print(i)
    display(g)


### Adding content to ArcGIS Online

There are several ways to add content: manually via ArcGIS Online user interface; using the Content Manager API in Python. 

You can add an item directly using the add() method on ContentManager and passing a dictionary with item properties. Once an item is added, it can be used to publish web data services of several types. This is shown in the next few cells.



In [None]:
# set data location (uncomment one of them)
data_location = os.environ["HOME"]+"/public/datasets/"  # in the shared datahub 
# data_location = "../../8. Data/"    # on my local install

In [None]:
# remember files we used in previous lectures, e.g.
! ls -ltr ../../../../public/datasets/california

# let's add it to ArcGIS so that it can be later discovered


In [None]:
shp_path = data_location+ "california/california_coastline.zip"

# Need to give it a unique title. 
# One can have multiple shapefiles added, but published feature layer names should be unique within an organization.

shp_properties = {'title':'California coastline', 'tags':'sample data'}
coastline_shapefile = gis.content.add(item_properties = shp_properties, data=shp_path)



In [None]:
# if the item already exists in AGOL - it was added already
coastline_shapefile = gis.content.search(query="California_coastline owner:izaslavsky_ucsd",max_items=50)
counter=0
for item in coastline_shapefile:
    print(counter, item)
    counter+=1


### Metadata for an item you added to AGOL

For the item above, we added the absolute minimum set of properties! Ideally, we'd add more metadata. 
Lack of metadata for your GIS products leads to inability to reproduce the results and to explain the results to others!

ArcGIS forces you to add some descriptions. 

Let's look at it in Arcgis Online!

In [None]:
# What if there is already a shapefile with this name? It's always a good idea to check...

search_result = gis.content.search(query="california_coastline.zip", max_items=100)

# this returns a list - let's look at its content. 
# Can loop through all objects and delete.
search_result

In [None]:
# Can delete in two ways - by referencing the ID, or via objects returned in gis.content search
# Option 1:
id = search_result[0].id
print(id)
# gis.content.get(id).delete()

In [None]:
# Option 2:

# gis.content.search(query="california_coastline.zip")[0].delete()

### Publishing a web layer

Once a file (shapefile, csv, tile package, geodatabase, etc.) is added to Content, it can be published as a web layer, so that it can be accessed from web apps. 
#### Use item_name.publish (or click Publish in ArcGIS Online UI)

In [None]:
# check if this item is already publishd as Feature layer:

search_result = gis.content.search(query="california_coastline owner:izaslavsky_ucsd", max_items=100, item_type="Feature Layer")
count=0
for item in search_result:
    print(count,item)
    count+=1

In [None]:
# and delete, if needed:

gis.content.search(query="california_coastline owner:izaslavsky_ucsd", max_items=100, item_type="Feature Layer")[0] #.delete()

In [None]:
coastline_feature_layer_item = gis.content.search(query="california_coastline.zip",max_items=100)[0].publish({"name":"CA Coastline IZ25"})
coastline_feature_layer_item

# Let's look at it in ArcGIS Online! It is a hosted feature layer. You can do a lot more with such layers than with shapefiles

# !! IMPORTANT: Can't create a service if a service with this name already exists IN ORGANIZATION. Make sure the name is unique

In [None]:
# Optional: to check if the item already exists, and delete it as needed

search_result = gis.content.search(query="California coastline", max_items=100)
search_result

In [None]:
# we can delete the hosted feature layer just created:
gis.content.search(query="California coastline IZ24")[0].delete()


In [None]:
# let's make sure we deleted the feature layer, and still have the shapefile there
search_result = gis.content.search(query="California coastline owner:izaslavsky_ucsd", max_items=100)
search_result

As you could see, you can easily add or remove different types of items on AGOL. But make sure you check that you remove those items that you intended. The best way is to get item's unique IDs, and use that to delete.


### Sharing your data
#### You need to share the data that you want us to see (eg as part of your MP submissions)

You can share with Everyone, Organization, or specific Groups.

We have groups where students share their final projects. For example, "DSC 170 Past Projects" (https://ucsdonline.maps.arcgis.com/home/group.html?id=441a0879ccaf48a9ba6920dcd3cb2d68), or "DSC170 Winter23 Final Projects" (https://ucsdonline.maps.arcgis.com/home/group.html?id=6afe43c520844816bd4323d8f1a6d2ec)

Projects teams can also create groups for their projects (make sure to set up groups where members can edit items from other members! This setting cannot be changed later.) You will need these groups for final projects as well. Eventually, we'll create a gallery of DSC170 final projects from  this quarter as a separate group.

In [None]:
# explore the coastline_feature_layer_item:
search_result[1]

In [None]:
coastline_feature_layer_item=gis.content.get('651d1ee2598444b994412fdcba892d39')
coastline_feature_layer_item

In [None]:
# initially, item access is private:

print(coastline_feature_layer_item.id)
print(coastline_feature_layer_item.access)

In [None]:
# Here is an example of how you can share a data item with an existing group

# You can search for a group object, and then directly reference it by ID

dsc170projectgroup_search = gis.groups.search("DSC 170 Past Projects")
print(dsc170projectgroup_search)
print(dsc170projectgroup_search[0].id)


In [None]:
# reference an existing group by ID:
dsc170group = gis.groups.get('441a0879ccaf48a9ba6920dcd3cb2d68')
dsc170group

#### Why is this important?
When you submit a notebook as part of an assignment, and it refers to some layers that you created on AGOL - you want to share these layers with this group, so that we can run the notebook without data access issues. 

In [None]:
# coastline_feature_layer_item=gis.content.get('e8678a93fcd1413aa7e287dcad948a8a')
coastline_feature_layer_item.access
# Here is how you share your data:
coastline_feature_layer_item.id
# coastline_feature_layer_item.share(everyone = True)
# coastline_feature_layer_item.share(org=True)
# coastline_feature_layer_item.share(everyone=False)


# Can reference a list containing group objects, or group IDs, or group names
# coastline_feature_layer_item.share(groups=[dsc170group])

# coastline_feature_layer_item.share(groups=['DSC 170 Past Projects'])
# coastline_feature_layer_item.share(groups=['f52c1a932d954687b7d211f0de8d4b01'])

In [None]:
coastline_feature_layer_item.access

### A different mechanism is used to add data FROM A PANDAS DATAFRAME. 
(can also ingest ArcGIS "Spatially Enabled DataFrame" - more about it later)

How you add such data depends on its size:
* Small tables (less than 1000 records) can be added using gis.content.import_data(df)
* Larger tables need to be setup as "Spatially-Enabled Data Frames"  (SEDF)

#### Example of importing pandas dataframe:
Let's take these iNaturalist data at 
http://suave2.sdsc.edu/main/file=suavedemos_AfriBats_iNaturalist_Photos.csv&view=map


In [None]:
# let's try to import this dataset via gis.content.import_data 

import pandas as pd
from arcgis import features

csv1 = data_location +'world/afribats3.csv'

afribat_data = pd.read_csv(csv1)
print("Number of records : " + str(len(afribat_data)))
afribat_data.head()


In [None]:
afribat_featureservice = gis.content.import_data(
    df=afribat_data,
    location_type="coordinates",
    latitude_field="latitude#number#hidden",
    longitude_field="longitude#number#hidden",
    sanitize_columns=True
)


# this has a limit of 1000 records only!! Keep this in mind. May or may not work!

In [None]:
# if you try publishing a small fragment  - it works:

afribat_data = afribat_data.dropna(subset=["latitude#number#hidden", "longitude#number#hidden"])

In [None]:
afribat_sample = afribat_data.head(10)

afribat_featureservice = gis.content.import_data(
    df=afribat_sample,
    location_type="coordinates",
    latitude_field="latitude",
    longitude_field="longitude",
    sanitize_columns=True
)



In [None]:
afribat_featureservice.properties

In [None]:
# another way is to convert to Spatially-Enabled Data Frame (SEDF) first, and then publish :

sdf = pd.DataFrame.spatial.from_xy(afribat_data,x_column = 'longitude#number#hidden', y_column='latitude#number#hidden')


# Other ways to create a SEDF from pandas: from a list of addresses (from_table)
# from feature class, from layer...
# this is similar to GeoDataFrame in Pandas

In [None]:
# explore the content of this new table. What does it resemble?
sdf

In [None]:
# testing:
from arcgis.features import GeoAccessor, GeoSeriesAccessor

# Assuming 'sdf' is your spatially-enabled DataFrame with point geometries

# Extract latitude and longitude from the SHAPE column
sdf['lat'] = sdf['SHAPE'].apply(lambda geom: geom.y)
sdf['lon'] = sdf['SHAPE'].apply(lambda geom: geom.x)




In [None]:
sdf

In [None]:
# notice that internally, when creating a feature layer from a SEDF, ArcGIS first adds a shapefile, 
# and then creates a feature layer from it (refresh Content of ArcGIS Online to see)

sdf_fl = sdf.spatial.to_featurelayer(title ='afriba_iz', gis=gis, tags="afri", sanitize_columns=True) 

# this process expects a clean SDEF: all records should have lat/lon. 
# Also, shapefiles don't handle boolean variables natively, because the dBase format of .dbf files is old. 
# As a result, TRUE and FALSE are often represented as integers, or characters.

# When you get an error referring to a failure of casting values as integers,this is usually because of such logical variables.

# There may be more limitations. Do this publishing carefully!

# sanitize_columns=True ensures that column names are 10 characters or less 

In [None]:
# View the layer you just published from SEDF
sdf_fl

#### Do you see any problems with this procedure?
Hint: explore the layer being created in your AGOL content, and think about their limitations.

#### Limitations of shapefiles - we already know several
    - Size limitations: up to 2 GB.
    - Shapefiles can only store attribute data in a single table, so you cannot store multiple tables or relationships between data in a shapefile.
    - Coordinate precision: only six decimal places of precision.
    - Limited geometry types: points, lines, and polygons.
    - Limited metadata: no standard way to store metadata.
    - Attribute tables based on DB4 structure, which has limitation, e.g., wrt column names


In [None]:
# Now we can show this layer on a map.
# Notice how a map gets constructed: you define the map (possibly giving a placename), add one or more layers, then display it

m = gis.map()
m.content.add(sdf_fl)
m


# Where is the data??

In [None]:
import arcgis.mapping
print(dir(arcgis.mapping))


### Advantages of an integrated enterprise system

1. Manages a data catalog, with different types of resources accessible by different groups of people, within and outside an organization. The catalog can be searched and managed from code.
1. Large data can be served from ArcGIS Online and servers. 
1. Several APIs for the same set of functions (Javascript, Python)
1. Ease of data integration: can use foundational data to enrich any polygons ("geoenrichment") to support data analysis
1. From desktop packages one can now interact with content of ArcGIS online and servers, and launch Jupyter notebooks. 
1. Specialization of different components used to be better defined: desktop for managing and cleaning large volumes of data, advance analysis and workflows; servers for serving data into web maps and enabling analysis over pre-built datasets. But ArcGIS Online and Python API have been developing rapidly, and adopting functions that were earlier evailable only on desktop. 
1. There are a number of application builders for different platforms and tasks. 
1. Ease of use, scalability, and professional support.

But things can go wrong. For example, Web map applications (which are background layers + operational layers + tools/widgets) may be developed from independently devloped feature layers. When the layers change - web maps break. ArcGIS Python developer examples include scripts to detect that. 


### As earlier with Geopandas, we'll start with map drawing

In [None]:
# Towards Smart Mapping,
# where optimal renderers are suggested by the system
# However, the number of renderers is limited, not sufficient for a full-fledged map authoring
# For better map authoring, use desktop systems. Eg, ArcGIS Pro can now export in Adobe Illustrator format

from arcgis.geocoding import geocode


In [None]:
# Let's initiatlize a map, and create a map of UCSD, then draw a bike path over it
map1 = gis.map()
map1


In [None]:
map1.basemap.basemap = 'dark-gray-vector'

In [None]:
# a complete list of backgrounds
print(map1.basemap.basemaps)

In [None]:
# We will map the first of the locations returned on geocoding request for string "UC San Diego"

location = geocode("UC San Diego")[0]
location['extent'].update({'spatialReference': {'wkid': 4326}})
map1.extent = location['extent']
map1.height = '800px'


In [None]:
# let's convert location (which is a dict - you can check it) into a layer, and add it to the map

from arcgis.features import FeatureSet, Feature
from arcgis.geometry import Geometry

# Extract geometry and attributes
geometry = {
    "x": location["location"]["x"],
    "y": location["location"]["y"],
    "spatialReference": {"wkid": 4326},
}

attributes = location["attributes"]

# Create a Feature
feature = Feature(geometry=geometry, attributes=attributes)

# Create a FeatureSet
feature_set = FeatureSet([feature])

In [None]:
map1.content.add(feature_set)

# Mapping in 3D

In [None]:
from arcgis.map import Scene

In [None]:
scene1 = Scene()
scene1

In [None]:
world_countries_item = gis.content.get('ac80670eb213440ea5899bbf92a04998')
world_countries_layer = world_countries_item.layers[0]
world_countries_layer

In [None]:
scene1.content.add(world_countries_layer)

In [None]:
# now, let's add UCSD there

# Extract geocoded data into a Pandas DataFrame
data = {
    "address": [location["address"]],
    "x": [location["location"]["x"]],
    "y": [location["location"]["y"]],
    **location["attributes"]
}
df = pd.DataFrame(data)

# Convert Pandas DataFrame to a Spatially-Enabled DataFrame
sedf = GeoAccessor.from_xy(df, x_column="x", y_column="y", sr=4326)

In [None]:
# Convert the SEDF into a Feature Layer
feature_layer = sedf.spatial.to_featurelayer(title="Geocoded Point Layer")

In [None]:
# and add it to the scene
scene1.content.add(feature_layer)

In [None]:
# What layers are shown on a map or a scene?

for idx,lyr in enumerate(scene1.content.layers):
    print(f"{idx:<6}{lyr}")

In [None]:
# let's remove this layer
scene1.content.remove(0)

## Example of adding geometry to pandas df and then publishing on ArcGIS Online

In [None]:
map2 = gis.map()
map2

In [None]:
world_sdg1 = pd.read_csv(data_location+'world/Global_Index_Data_subset.csv')
world_sdg1

# But we don't have any spatial data here! What to do?



In [None]:
# let's find some world map by countries using gis.content.search
result = gis.content.search('title: World Countries (Generalized)' , item_type="Feature Layer", outside_org=True, max_items=30)
result
from IPython.display import display
counter = 0
for item in result:
    print(counter)
    display(item)
    counter+=1

In [None]:
# let's grab the layer with  3-character country identifier
# and add it to the map

countries = result[0]
map2.content.add(countries)


# Good! We can use this layer. Now we just need to merge data from our pandas dataframe to this layer. 
# What variable to merge on?

In [None]:
# First, we move the spatial data from the found layer, into SEDF:

countries_df = pd.DataFrame.spatial.from_layer(countries.layers[0])

In [None]:
countries_df.head()

In [None]:
cm2 = countries_df.merge(world_sdg1,on='ISO_3DIGIT').spatial

# Note that by adding ".spatial" we declare cm2 a SEDF

In [None]:
cm2


In [None]:
print(type(cm2))

In [None]:
# since this is a SEDF, we can plot it directly, then add rendering 

map3 = gis.map()
cm2.plot(map_widget=map3)

In [None]:
# Smart Mapping simplifies rendering:
renderer_manager = map3.content.renderer(0)
smm = renderer_manager.smart_mapping()
smm.class_breaks_renderer(break_type="size", field="GDP_per_capita_2016") 

# that showed the map with graduated symbols

In [None]:
renderer_manager = map3.content.renderer(0)
smm = renderer_manager.smart_mapping()
smm.class_breaks_renderer(break_type="color", field="GDP_per_capita_2016", classification_method="natural-breaks", num_classes=5) 

# that showes a  choropleth map using color values
# classification_method: equal-interval, quantile, natural-breaks, standard-deviation

In [None]:
map3.legend.enabled=True

In [None]:
map3

In [None]:
# In the previous cell, we plotted cm2 and added it to map3.
# Another mechanism is adding feature layers to the map, as before.

map4 = gis.map("San Diego, CA")
map4



In [None]:

map4.basemap.basemap = "dark-gray-vector"


In [None]:
map4.basemap.basemaps