# 5-1: Using NumPy for ArcGIS

- NumPy: a numeric computing libary: multidimensional array, linear algebra, mathematical operation
- Offer fast operation of n-dimensional array
- The foundation of the Scientific Python Ecosystem
    - pandas
    - SciPy
    - matplotlib
    - Seaborn
    - scikit-learn
    - IPython
    - NetworkX
    - Statsmodel
    
Helpful resource on NumPy:
- [NumPy official site](https://numpy.org/)
- [Tutorials on the scientific Python ecosystem](http://scipy-lectures.org/intro/numpy/index.html)

## 1. import the NumPy package

In [1]:
import numpy as np # this is a convention

## 2. Get started with the basics

### 2.1 Recall a Python list

- Converting list to numpy ndarray using `np.array()`

In [2]:
my_list = [1, 2, 3, 4]
my_list

[1, 2, 3, 4]

In [3]:
my_list = list(range(1, 5))
my_list

[1, 2, 3, 4]

In [4]:
np.array(my_list) 

array([1, 2, 3, 4])

In [5]:
type(np.array(my_list))

numpy.ndarray

- Create ndarray with `np.arange()`: control interval width

In [6]:
np.arange(1, 5)

array([1, 2, 3, 4])

In [7]:
np.arange(10)   # default is start from 0

array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])

In [8]:
np.arange(0, 10)

array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])

In [9]:
np.arange(0, 10, 2)

array([0, 2, 4, 6, 8])

- Create ndarray the `np.linspace()`: controlling number of elements

In [10]:
np.linspace(1, 5, 5) # start, stop, num

array([1., 2., 3., 4., 5.])

In [11]:
np.linspace(0, 5, 11)   # (5-0)/(11-1)

array([0. , 0.5, 1. , 1.5, 2. , 2.5, 3. , 3.5, 4. , 4.5, 5. ])

In [12]:
np.linspace(0, 5, 21)

array([0.  , 0.25, 0.5 , 0.75, 1.  , 1.25, 1.5 , 1.75, 2.  , 2.25, 2.5 ,
       2.75, 3.  , 3.25, 3.5 , 3.75, 4.  , 4.25, 4.5 , 4.75, 5.  ])

In [13]:
np.linspace(0, 5, 26)   # it requires a lot more codes without numpy i.e., using loop

array([0. , 0.2, 0.4, 0.6, 0.8, 1. , 1.2, 1.4, 1.6, 1.8, 2. , 2.2, 2.4,
       2.6, 2.8, 3. , 3.2, 3.4, 3.6, 3.8, 4. , 4.2, 4.4, 4.6, 4.8, 5. ])

### 2.2 Indexing and slicing on NumPy array

- index starts with "**0**" as usual
- a negative index start count from the end of the array
- use the colon to get multiple elements and return as an array
- we can also slice NumPy array with step

In [14]:
my_arr = np.arange(10)
my_arr

array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])

In [15]:
my_arr[4]

4

In [16]:
my_arr[-2]

8

In [17]:
my_arr[1:6]

array([1, 2, 3, 4, 5])

In [18]:
my_arr[1:6:2]

array([1, 3, 5])

In [19]:
my_arr[:]

array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])

In [20]:
my_arr[::-1]

array([9, 8, 7, 6, 5, 4, 3, 2, 1, 0])

## 3. Generate random numbers with NumPy

In [21]:
np.random.randint(2, 10)

6

In [22]:
np.random.randint(10)

5

In [23]:
np.random.randint(0, 10, 20)

array([3, 1, 8, 5, 2, 2, 7, 3, 5, 1, 0, 1, 8, 7, 0, 0, 4, 6, 5, 2])

In [24]:
np.random.rand(3,2) # from a uniform distribution over `[0, 1)`

array([[0.74538232, 0.41584597],
       [0.35122428, 0.03901372],
       [0.2887653 , 0.38537612]])

## 4. Work with NumPy in ArcGIS

In [25]:
import arcpy

In [26]:
gdb_worksp = r"D:\Dropbox (UFL)\URP6271\urp6271_spring2022\class_data.gdb"
arcpy.env.workspace = gdb_worksp

In [27]:
arcpy.ListFeatureClasses()

['county_boundary',
 'hospitals',
 'schools',
 'I75',
 'roads',
 'law_enforcement',
 'major_highways',
 'zip_boundaries',
 'major_roads',
 'landuse',
 'crash',
 'blockgroups',
 'I75_2mile_buff',
 'schools_2mile_I75',
 'blockgroups_school_spjoin',
 'zipbnd_q1',
 'zipbnd_q2',
 'blockgroups_Layer2_CopyFeatures']

In [28]:
school_fc = "schools"

### 4.1 Retrieve field names of a feature class `arcpy.ListFields()`

- a [field object](https://pro.arcgis.com/en/pro-app/2.7/arcpy/classes/field.htm)
- Code example:

```python
import arcpy

feature_class = "c:/data/counties.shp"

# Create a list of fields using the ListFields function
fields = arcpy.ListFields(feature_class)

# Iterate through the list of fields
for field in fields:
    # Print field properties
    print("Field:       {0}".format(field.name))
    print("Alias:       {0}".format(field.aliasName))
    print("Type:        {0}".format(field.type))
    print("Is Editable: {0}".format(field.editable))
    print("Required:    {0}".format(field.required))
    print("Scale:       {0}".format(field.scale))
    print("Precision:   {0}".format(field.precision))
```

In [29]:
school_fields = []
for field in arcpy.ListFields(school_fc):
    school_fields.append(field.name)
print(school_fields)

['OBJECTID_1', 'Shape', 'OBJECTID', 'STATUS', 'SCORE', 'SIDE', 'MATCH_ADDR', 'FEDERAL_ID', 'STATE_ID', 'SCHOOL_ID', 'NAME', 'ADDRESS', 'CITY', 'ZIPCODE', 'PHONE', 'COUNTY', 'OPERATING', 'OP_CLASS', 'ENROLLMENT', 'PROGRAMS', 'COMMON_USE', 'USE', 'TYPE', 'ACTIVITY', 'GRADES', 'LOW_GRADE', 'HIGH_GRADE', 'PRINCIPAL', 'TEACHERS', 'STDTCH_RT', 'MIGRNT_STD', 'TITLE1SCHO', 'MAGNETINFO', 'FREE_LUNCH', 'REDUCED_LU', 'FISH_FAC1', 'FISH_FAC2', 'COMMENTS', 'BBSERVICE', 'BBPROVIDER', 'BBSPEED', 'DSTREAMSPD', 'YR_BUILT', 'PARCEL_ID', 'LAT_DD', 'LONG_DD', 'USNG_FL_1K', 'FDOE_MSID', 'NCES_PUB', 'NCES_PRIV', 'FDOE_PRV', 'SOURCE', 'DESCRIPT', 'FLAG', 'UPDATE_DAY', 'FGDLAQDATE', 'AUTOID']


### 4.2 Convert a feature class to NumPy array

- `arcpy.da`: the **_[Data Access](https://pro.arcgis.com/en/pro-app/2.8/arcpy/data-access/what-is-the-data-access-module-.htm)_** module
- `arcpy.da.FeatureClassToNumPyArray` converts a feature class to a **_Structured Array_**
- learn more about [structured arrays](https://docs.scipy.org/doc/numpy/user/basics.rec.html) 

In [30]:
school_arr = arcpy.da.FeatureClassToNumPyArray(
    school_fc, ["NAME", 'OP_CLASS', 'ENROLLMENT', 'TYPE', 'TEACHERS']
)

In [31]:
school_arr

array([('GRACE CHRISTIAN SCHOOL OF ALACHUA CO.', 'PRIVATE',    0., 'SENIOR HIGH',   0. ),
       ('FAMILY LIFE ACADEMY', 'PRIVATE',    0., 'COMBINATION ELEMENTARY & SECONDARY',   0. ),
       ('FOREST GROVE CHRISTIAN ACADEMY', 'PRIVATE',   53., 'COMBINATION ELEMENTARY & SECONDARY',   9.4),
       ('VAISHNAVA ACADEMY FOR GIRLS', 'PRIVATE',   19., 'COMBINATION JR. HIGH & SENIOR HIGH',   0. ),
       ('BHAKTIVEDANTA ACADEMY', 'PRIVATE',    0., 'COMBINATION ELEMENTARY & MIDDLE',   0. ),
       ('DESTINY CHRISTIAN ACADEMY', 'PRIVATE',    0., 'COMBINATION ELEMENTARY & MIDDLE',   0. ),
       ('INCAF MONTESSORI SCHOOL', 'PRIVATE',    0., 'ELEMENTARY',   0. ),
       ('GREAT AMERICAN VISIONS ENTERPRISES,INC', 'PRIVATE',    0., 'COMBINATION ELEMENTARY & MIDDLE',   0. ),
       ('JORDAN GLEN SCHOOL INC.', 'PRIVATE',  115., 'COMBINATION ELEMENTARY & MIDDLE',  14.5),
       ('QUEEN OF PEACE CATHOLIC ACADEMY', 'PRIVATE',  358., 'COMBINATION ELEMENTARY & MIDDLE',  28.4),
       ('THE ROCK SCHOOL', '

Retrieve the **shape** (or length) of an array

In [32]:
len(school_arr)

112

In [33]:
school_arr.shape

(112,)

View the first five elements of a structured array

In [34]:
school_arr[:5]

array([('GRACE CHRISTIAN SCHOOL OF ALACHUA CO.', 'PRIVATE',  0., 'SENIOR HIGH', 0. ),
       ('FAMILY LIFE ACADEMY', 'PRIVATE',  0., 'COMBINATION ELEMENTARY & SECONDARY', 0. ),
       ('FOREST GROVE CHRISTIAN ACADEMY', 'PRIVATE', 53., 'COMBINATION ELEMENTARY & SECONDARY', 9.4),
       ('VAISHNAVA ACADEMY FOR GIRLS', 'PRIVATE', 19., 'COMBINATION JR. HIGH & SENIOR HIGH', 0. ),
       ('BHAKTIVEDANTA ACADEMY', 'PRIVATE',  0., 'COMBINATION ELEMENTARY & MIDDLE', 0. )],
      dtype=[('NAME', '<U100'), ('OP_CLASS', '<U12'), ('ENROLLMENT', '<f8'), ('TYPE', '<U35'), ('TEACHERS', '<f8')])

In [35]:
school_arr[0:5]

array([('GRACE CHRISTIAN SCHOOL OF ALACHUA CO.', 'PRIVATE',  0., 'SENIOR HIGH', 0. ),
       ('FAMILY LIFE ACADEMY', 'PRIVATE',  0., 'COMBINATION ELEMENTARY & SECONDARY', 0. ),
       ('FOREST GROVE CHRISTIAN ACADEMY', 'PRIVATE', 53., 'COMBINATION ELEMENTARY & SECONDARY', 9.4),
       ('VAISHNAVA ACADEMY FOR GIRLS', 'PRIVATE', 19., 'COMBINATION JR. HIGH & SENIOR HIGH', 0. ),
       ('BHAKTIVEDANTA ACADEMY', 'PRIVATE',  0., 'COMBINATION ELEMENTARY & MIDDLE', 0. )],
      dtype=[('NAME', '<U100'), ('OP_CLASS', '<U12'), ('ENROLLMENT', '<f8'), ('TYPE', '<U35'), ('TEACHERS', '<f8')])

View the 1st, 5th, 10th, 100th element of the array

- supply the indicies as a Python list

In [36]:
school_arr[[0, 4, 9, 99]]

array([('GRACE CHRISTIAN SCHOOL OF ALACHUA CO.', 'PRIVATE',   0., 'SENIOR HIGH',  0. ),
       ('BHAKTIVEDANTA ACADEMY', 'PRIVATE',   0., 'COMBINATION ELEMENTARY & MIDDLE',  0. ),
       ('QUEEN OF PEACE CATHOLIC ACADEMY', 'PRIVATE', 358., 'COMBINATION ELEMENTARY & MIDDLE', 28.4),
       ('HORIZON CENTER. ALTERNATIVE SCHOOL', 'PUBLIC',  79., 'COMBINATION JR. HIGH & SENIOR HIGH', 17. )],
      dtype=[('NAME', '<U100'), ('OP_CLASS', '<U12'), ('ENROLLMENT', '<f8'), ('TYPE', '<U35'), ('TEACHERS', '<f8')])

Retrieve a **field** (column) from a structured array: using the field name

In [37]:
school_arr['ENROLLMENT']

array([   0.,    0.,   53.,   19.,    0.,    0.,    0.,    0.,  115.,
        358.,  207.,   52.,  260.,  106.,   84.,  209.,   22.,    0.,
        246.,  216.,  264.,   45.,   19.,    0.,    0.,  753.,    0.,
        211.,    0.,   55.,    0.,  336.,   43.,    0.,   32.,    0.,
         16.,  116.,  443.,    0.,   76.,  583.,  520.,  596.,  938.,
          0.,    0.,   39.,   24.,  132.,  259., 1129.,  453.,  167.,
        521.,  446.,    0.,    0.,    0., 2221.,  439., 1139.,  711.,
        801.,  836.,  711.,  731.,  657.,  628., 1047.,   47.,  463.,
       1928.,  466.,   71.,  105.,    0.,   67.,   19.,    0.,  932.,
        644.,   53.,  106.,  570.,   86.,   35.,  131.,  100.,  709.,
        572.,    0.,  445.,  717.,  531.,  114.,  395.,   63.,  109.,
         79.,  408., 1549.,  202.,  193.,  395.,  215.,    0.,    0.,
          0.,    0.,    0.,  365.])

### 4.3 Compute statistics of an ndarray

- maximum: ```np.max()```
- minimum: ```np.min()```
- mean: ```np.mean()```
- standard deviation: ```np.std()```

In [38]:
enroll_arr = school_arr['ENROLLMENT']

In [39]:
np.max(enroll_arr)

2221.0

In [40]:
np.min(enroll_arr)

0.0

In [41]:
np.mean(enroll_arr)

294.35714285714283

In [42]:
np.std(enroll_arr)

397.6106732080762

### 4.4 Simple Query against NumPy array

- which school has the largest enrollment: `argmax`
- which school has the smallest enrollment: `argmin`

In [43]:
enroll_arr.argmax() # returns the index of the largest value

59

In [44]:
school_arr['NAME'][enroll_arr.argmax()]

'BUCHHOLZ HIGH SCHOOL'

In [45]:
school_arr['NAME'][enroll_arr.argmin()]

'GRACE CHRISTIAN SCHOOL OF ALACHUA CO.'

### 4.5 Generate new arrays based on a conditional statement

- schools enrollment is positive
- schools that are public

In [46]:
enroll_arr > 0 # returns as an ndarray of booleans

array([False, False,  True,  True, False, False, False, False,  True,
        True,  True,  True,  True,  True,  True,  True,  True, False,
        True,  True,  True,  True,  True, False, False,  True, False,
        True, False,  True, False,  True,  True, False,  True, False,
        True,  True,  True, False,  True,  True,  True,  True,  True,
       False, False,  True,  True,  True,  True,  True,  True,  True,
        True,  True, False, False, False,  True,  True,  True,  True,
        True,  True,  True,  True,  True,  True,  True,  True,  True,
        True,  True,  True,  True, False,  True,  True, False,  True,
        True,  True,  True,  True,  True,  True,  True,  True,  True,
        True, False,  True,  True,  True,  True,  True,  True,  True,
        True,  True,  True,  True,  True,  True,  True, False, False,
       False, False, False,  True])

In [47]:
school_type_arr = school_arr['OP_CLASS']
school_type_arr == 'PUBLIC'

array([False, False, False, False, False, False, False, False, False,
       False, False, False, False, False, False, False, False, False,
       False, False, False, False, False, False, False, False, False,
       False,  True, False, False, False, False, False, False, False,
       False,  True,  True,  True,  True,  True,  True,  True,  True,
        True,  True,  True,  True,  True,  True,  True,  True,  True,
        True,  True,  True,  True,  True,  True,  True,  True,  True,
        True,  True,  True,  True,  True,  True,  True,  True,  True,
        True,  True,  True,  True,  True,  True,  True,  True,  True,
        True,  True,  True,  True,  True,  True,  True,  True,  True,
        True,  True,  True,  True,  True,  True,  True,  True,  True,
        True,  True,  True,  True,  True,  True,  True,  True,  True,
        True,  True,  True, False])

Use array of booleans to select from an ndarray

- the array of booleans must be in the same shape as the original array

In [48]:
num_arr = np.array([1, 2, 3, 4])
num_arr

array([1, 2, 3, 4])

In [49]:
bool_arr = np.array([True, False, True, True])

In [50]:
num_arr[bool_arr]

array([1, 3, 4])

In [51]:
# How to find out which school doesn't have positive enrollment
# np.invert() negates the judgement
school_arr[np.invert(enroll_arr > 0)]['NAME']

array(['GRACE CHRISTIAN SCHOOL OF ALACHUA CO.', 'FAMILY LIFE ACADEMY',
       'BHAKTIVEDANTA ACADEMY', 'DESTINY CHRISTIAN ACADEMY',
       'INCAF MONTESSORI SCHOOL',
       'GREAT AMERICAN VISIONS ENTERPRISES,INC',
       'GAINESVILLE CONDUCTIVE EDUCATION ACADEMY',
       'GAINESVILLE CONDUCTIVE EDUCATION ACADEMY',
       'KIDS N ALL CHRISTIAN ACADEMY', 'FREEDOM CHRISTIAN ACADEMY',
       'FAITH TABERNACLE OF PRAISE SCHOOL OF MINISTRY', 'CITY COLLEGE',
       'COMPASSIONATE OUTREACH MINISTRIES',
       'OAK HILL COMMUNITY PRIVATE SCHOOL SYSTEM',
       'SANTA FE COLLEGE - DAVIS CENTER',
       'UNIVERSITY OF FLORIDA - COMPARITIVE MEDICINE',
       'NORTH AMERICAN FAMILY INSTITUTE ALACHUA ACADEMY',
       'UNIVERSITY OF FLORIDA', 'UNIVERSITY OF FLORIDA - AGRONOMY LAB',
       'SANTA FE COLLEGE - NORTHWEST CAMPUS', 'UNIVERSITY OF FLORIDA',
       'SANTA FE COMMUNITY COLLEGE',
       'ALACHUA COUNTY SUPERINTENDENT OFFICE - KIRBY-SMITH CENTER',
       'UNIVERSITY OF FLORIDA',
       'SANTA

### 4.6 Simple numeric operation 

- operation on its own
- operation including two arrays (must be of the same shape)

In [52]:
enroll_arr

array([   0.,    0.,   53.,   19.,    0.,    0.,    0.,    0.,  115.,
        358.,  207.,   52.,  260.,  106.,   84.,  209.,   22.,    0.,
        246.,  216.,  264.,   45.,   19.,    0.,    0.,  753.,    0.,
        211.,    0.,   55.,    0.,  336.,   43.,    0.,   32.,    0.,
         16.,  116.,  443.,    0.,   76.,  583.,  520.,  596.,  938.,
          0.,    0.,   39.,   24.,  132.,  259., 1129.,  453.,  167.,
        521.,  446.,    0.,    0.,    0., 2221.,  439., 1139.,  711.,
        801.,  836.,  711.,  731.,  657.,  628., 1047.,   47.,  463.,
       1928.,  466.,   71.,  105.,    0.,   67.,   19.,    0.,  932.,
        644.,   53.,  106.,  570.,   86.,   35.,  131.,  100.,  709.,
        572.,    0.,  445.,  717.,  531.,  114.,  395.,   63.,  109.,
         79.,  408., 1549.,  202.,  193.,  395.,  215.,    0.,    0.,
          0.,    0.,    0.,  365.])

In [53]:
enroll_arr * 2

array([   0.,    0.,  106.,   38.,    0.,    0.,    0.,    0.,  230.,
        716.,  414.,  104.,  520.,  212.,  168.,  418.,   44.,    0.,
        492.,  432.,  528.,   90.,   38.,    0.,    0., 1506.,    0.,
        422.,    0.,  110.,    0.,  672.,   86.,    0.,   64.,    0.,
         32.,  232.,  886.,    0.,  152., 1166., 1040., 1192., 1876.,
          0.,    0.,   78.,   48.,  264.,  518., 2258.,  906.,  334.,
       1042.,  892.,    0.,    0.,    0., 4442.,  878., 2278., 1422.,
       1602., 1672., 1422., 1462., 1314., 1256., 2094.,   94.,  926.,
       3856.,  932.,  142.,  210.,    0.,  134.,   38.,    0., 1864.,
       1288.,  106.,  212., 1140.,  172.,   70.,  262.,  200., 1418.,
       1144.,    0.,  890., 1434., 1062.,  228.,  790.,  126.,  218.,
        158.,  816., 3098.,  404.,  386.,  790.,  430.,    0.,    0.,
          0.,    0.,    0.,  730.])

In [54]:
enroll_arr + 100

array([ 100.,  100.,  153.,  119.,  100.,  100.,  100.,  100.,  215.,
        458.,  307.,  152.,  360.,  206.,  184.,  309.,  122.,  100.,
        346.,  316.,  364.,  145.,  119.,  100.,  100.,  853.,  100.,
        311.,  100.,  155.,  100.,  436.,  143.,  100.,  132.,  100.,
        116.,  216.,  543.,  100.,  176.,  683.,  620.,  696., 1038.,
        100.,  100.,  139.,  124.,  232.,  359., 1229.,  553.,  267.,
        621.,  546.,  100.,  100.,  100., 2321.,  539., 1239.,  811.,
        901.,  936.,  811.,  831.,  757.,  728., 1147.,  147.,  563.,
       2028.,  566.,  171.,  205.,  100.,  167.,  119.,  100., 1032.,
        744.,  153.,  206.,  670.,  186.,  135.,  231.,  200.,  809.,
        672.,  100.,  545.,  817.,  631.,  214.,  495.,  163.,  209.,
        179.,  508., 1649.,  302.,  293.,  495.,  315.,  100.,  100.,
        100.,  100.,  100.,  465.])

In [55]:
pos_school_arr = school_arr[school_arr['ENROLLMENT'] > 0]

In [56]:
# ratio of number of teachers and students
pos_school_arr["TEACHERS"] / pos_school_arr["ENROLLMENT"]

array([0.17735849, 0.        , 0.12608696, 0.07932961, 0.08647343,
       0.11346154, 0.07615385, 0.06132075, 0.13690476, 0.09138756,
       0.        , 0.04390244, 0.11481481, 0.04772727, 0.13555556,
       0.10526316, 0.08446215, 0.18720379, 0.        , 0.05208333,
       0.        , 0.065625  , 0.0625    , 0.07758621, 0.07110609,
       0.05263158, 0.05831904, 0.07115385, 0.05369128, 0.0575693 ,
       0.        , 0.        , 0.06818182, 0.08494208, 0.04340124,
       0.05960265, 0.05389222, 0.07101727, 0.07623318, 0.04502476,
       0.07744875, 0.1071115 , 0.07313643, 0.05867665, 0.06698565,
       0.06469761, 0.06839945, 0.07305936, 0.08041401, 0.05921681,
       0.06382979, 0.07019438, 0.04979253, 0.08154506, 0.05633803,
       0.23809524, 0.31343284, 0.        , 0.05686695, 0.06521739,
       0.        , 0.06132075, 0.07192982, 0.08139535, 0.02857143,
       0.04580153, 0.15      , 0.05994358, 0.06993007, 0.09438202,
       0.05997211, 0.06120527, 0.0877193 , 0.07088608, 0.07936

In [57]:
(pos_school_arr["TEACHERS"] / pos_school_arr["ENROLLMENT"]).argmax()

56

In [58]:
pos_school_arr['NAME'][(pos_school_arr["TEACHERS"] / pos_school_arr["ENROLLMENT"]).argmax()]

'A. QUINN JONES CENTER'