# Pre-Processing CSV data for MapAggC
#### General Approach:
1) Read in csv sheet of data.  Export fields into new dataframe.
2) Remove bad rows and elements.
3) Fix date datatype.
4) For ReportingUnit df, include ID value for easy match to shp file in ArcGIS (see below).
8) Export completed df as processed csZv.

#### Files Needed
1) dontopen_RU.csv, 
2) dontopen_AggLJAll.csv

In [1]:
#Needed Libararies
import os
import numpy as np
import pandas as pd
from datetime import datetime
pd.set_option('display.max_columns', 999)  # How to display all columns of a Pandas DataFrame in Jupyter Notebook
pd.options.display.max_rows  # Increases the length of a printed string from a DataFrame
pd.set_option('display.max_colwidth', -1)

  


## ReportingUnits Table and Export

#### General Notes
We drop the geometry field at this time.  API returns a encrypted version of it, which we can't work with at this time.  Have to instead match ReportingUnit_dim table to several separate shp files based on Unit Type (e.g., county, HUC8, custom, etc).  Have to build an ID value for this to work.  Each Unit Type file ID and match approach is different, see below.

#### ReportingUnit csv to ArcGIS shp file notes.
Creating ID to link p-csv to shp file works for every ReportUnitType but HUC8, as the ArcGIS shp we use doesn't incldue name in its data.  Have to cheat, and for all HUC8 types, the format should be "2_" + !HUC_8! + "_" + !StateNum!.  For the time, this was easier to do by hand than by code.

Wyoming also includes "... River Planning Basin" in its REportingUnitName.  Easier just to remove by hand.  Changed one WY shp file entry to Snake-Salt, from Salt-Snake.

California custom DAUCO also worked better using ReportingUnitNativeID.

AZ custom AM alsow worked better using ReportingUnitNative ID.  Shp didn't include a StateNum field, I just added it in.

USBR did work okay with ReportingUnitName.  Did have to update the shapefile with StateNum field = 100.

In [2]:
#Working Directory and Input File
workingDir = "C:/Users/rjame/Documents/WSWC Documents/Portal Creation Research"
os.chdir(workingDir)

In [3]:
RU_Input = "dontopen_RU.csv"
df_RU = pd.read_csv(RU_Input)
df_RU

Unnamed: 0,ReportingUnitID,ReportingUnitUUID,ReportingUnitNativeID,ReportingUnitName,ReportingUnitTypeCV,ReportingUnitUpdateDate,ReportingUnitProductVersion,StateCV,EPSGCodeCV,Geometry
0,254,00-01-03,00-01-03,Curlew Valley,Subarea,,,UT,EPSG:4326,0xE610000001043C000000C4648C3091255CC0175268181E00454023A1115A11255CC059C2B3780EFE44404C3ED088C1255CC080EDB92E0BFB44409AE530CBBC265CC03C558CBF4BFB4440F19AD105E2265CC015464A8248FC444035C7F839B0275CC04E107583F5F9444038CBA76A34265CC0F4BE00F50DF64440B1BB698F7A265CC002E5FD47BFF44440480DC8BDFB295CC0AE37BAB213F44440A47A4FBA742A5CC043D4B69C64F5444072A0DEB5D92B5CC0488CFC18A3F54440C55D48797C2C5CC069CE903DB7F44440D6F6DB193B2D5CC0DF46ADF0A7F2444030292232CC2B5CC0C42127055BF14440E4377DE95A2C5CC0616610E79DF0444062F38FE7A22C5CC094F96D957FED44408A6C7B42FA2D5CC027984E9F89EC44409ED86FA7592F5CC03DAFCB8B04E94440477FC2EE01315CC0F7E343181FE744407DCC9DCA49325CC0359BEDAAADE44440D88D9A659D325CC07850A6C8AFE24440CD8D8D7263345CC0E2ED6A41E8E14440F1E50A55BC355CC081DBEAB849DD444030EE4BE4DF355CC0B73A996578D94440FF21483451375CC0049804BE0DD94440299B841351375CC0FAEBB5EDC7D64440BFB31BD9ED385CC028722807C3D64440E3298F765A395CC07DA67E4EDFD54440E2C9438AD7385CC050C819CF63D544404210FBBE663B5CC030456E35CCD24440F9EBE212AD405CC0D67A6F314CD2444063C2008C0F425CC0AA5E102A8FD64440A37B27AF4A415CC0451EAA9AE5D844408B56DFDEC0415CC00A9AD3FCE5DA44400B9448A8A4405CC09541F6FCE2DA4440BACF9CA082415CC0E063E6BED9DC4440273D6F9000415CC049D77EECCCDD4440B55C3C9D3A415CC001071AB968DF44409629AD7FEC415CC0972AD734E7DF444091C03C9BCD415CC0329ECF8FA4E24440E487C2BF02415CC0F0811DEBCAE344405DB6984ADD405CC0A69A4B04C9E544405EF185B83E415CC08224976342E944402F92278BF5415CC0BD57D5FBE3EA444069FBE392A5415CC052B0072034EC4440DEC4DB793C435CC0CC67F98775EE444064CE76BDA7425CC0815596ACFCF04440B037886C10435CC028C86D16C8F34440ABD755923D445CC0016CF25447F64440095FBB3FC5455CC03DDCF8C5AEF74440A492EB8435475CC03F9D967B7CF64440AA437DB3894B5CC00685221C32F844407B07421AB94C5CC030FE448D08F84440055B9F36974E5CC0EB60B6F396F54440FC6DAD3844515CC084AC2C9AC4F54440986BA886594F5CC00B9BA7A01EF84440B3BD127A364F5CC06CC20F1FF6F944401C7CB7A50B4D5CC0943CA8402FFC44401B76B6C1604B5CC080F717E91A004540C4648C3091255CC0175268181E00454001000000020000000002000000FFFFFFFF0000000006000000000000000003
1,255,000-01-03,000-01-03,Clear Creek,Subarea,,,UT,EPSG:4326,0xE61000000104100000001B76B6C1604B5CC080F717E91A0045401C7CB7A50B4D5CC0943CA8402FFC444081F4F68E304F5CC01D28869801FA4440986BA886594F5CC00B9BA7A01EF84440A265166B9A505CC0AE03BD8D2BF7444029CDD83DBE505CC0FA7410BA24F64440863A445D12565CC06598EABE5FF3444053038E311C575CC06D5BDB0265F3444003597E7914585CC024F09BABDEF444401EE29D61A5595CC0F76F7101D1F34440280112FD455B5CC0CF750BE889F64440896D655B505D5CC0CDFEA3B63BF64440E8AF3D31FF5E5CC08EC4B44160F84440D2E0860E525E5CC0BF247DE27EFC444081EEF143D65C5CC060FB577CF9FF44401B76B6C1604B5CC080F717E91A00454001000000020000000002000000FFFFFFFF0000000006000000000000000003
2,256,00-07-02,00-07-02,Promontory Point,Subarea,,,UT,EPSG:4326,0xE610000001045600000033FF29DF4A175CC05816E60506004540C253DFF366185CC01AE3C0D87EFD444059300C1230175CC0C95406A390FA44401794B76006185CC0A5CF0E2CE7F74440825ED78E8C175CC0C7D6C8268EF64440E2FE6BE087185CC09501C0CD84F544407AF72111B2185CC0132F375BA4F24440EBE5EEA964195CC017CE920823F144407DF47C0AA0175CC0E37B48F9DDEC44401685F4BACF175CC01CEBFA3126EC444089476BCCEC165CC0048AA8FA0DEB4440341EB53112175CC055DA2C7D67E644402425E0195A165CC08E9B4C0EBEE444409A43AD8B00175CC000984389C3E34440DC3E2CD515165CC010E5092C04E144406430F1275D165CC09E00618B45DF4440BE5C00E9CF155CC0B8DBDE2275DD444058D73B71D3155CC060AF3AB214D844409755380711185CC0CAC32055DCD544400FF7936901185CC0F4057E0A8FD04440BAB61EE2FC185CC05BB5349515CF4440CD05FDE167195CC08EBFB607D5CA44404EF23C2993185CC0A553BFD569C74440CB82789E2C165CC04A65EBD439C244400B21FD22EB165CC03DFD11B30AC24440A1CDAA2EB9165CC0A970343E48BD44401D8A9EFB9F185CC0DFFF72A56FBC4440ADDF8839F91A5CC098C30D55A1B84440ADE591FA871A5CC0B279030130B444407072A25A70195CC00855883F64B24440172B911A0A1A5CC0BD7DD68C0AAF4440D13499D0E51A5CC05FEF4B5C7CAD44409044D451B2195CC0C2276DE543AB44403C73141596195CC030B8A9FE64A8444005914B1A24185CC085D1628029A344400C332C093D195CC014CA13BC30A24440E5805CB1DB195CC0D834AAAFDF9F444030B7F43D5C195CC03C815B7FDD9D44403B5F936FFD195CC0F9CEC8C0499C4440A047D090091B5CC01294ECEBD29B444095D6C485881B5CC05A6C8EDF159A4440D9A2D3DE1A1F5CC0D4294F00279C44401488B44197205CC0C1CA36DED0A24440FC7978C6A0205CC0D61894E34DA64440980D0358EC1F5CC043463C77B4A744405E9097ADB21F5CC04C9BCC5619AA4440E5F6926722225CC0096EC07B99AC4440A60ECA9179225CC022076198FDB344403285D41373235CC0AE156761B1B744408B3F8E485A265CC0D724415373B94440601C1379FA265CC0DBBA190836B9444094D10FA354265CC0E6A75C5383B84440876A09F3C9275CC09B4B9CF9E2B644406B4305573B285CC0D7944681CEB74440282CF05C7F2A5CC001EB644FDEB744408A1CE7B07A2B5CC03B425E1221BA4440F9958021602E5CC0382EA8005DBC44404D84D15FCB2F5CC01D604FB86EC144407840F234B5305CC0F7EC80F305C24440141B3861B8305CC0A0AF12E7CBC444409A498D886E2F5CC0C7D34E0E2BC84440FB5F9ECB982F5CC03164B7AF40CB4440BA74FD5F492C5CC0F8B93B61BCCB4440694C2C65B02B5CC049E6318FACCC444092044926662B5CC05B691164DECF4440076F9CE2062C5CC04CB56A54E6D04440FA22E60A202C5CC0196126BEBFD24440BFB6F0C7102A5CC02364183073D44440AADABA0151265CC05B24C6D77BD444403BBE671150265CC0CCCFC6BFEDD54440A2BC99A720255CC0825098E4A1D74440F6C165C640255CC0B884F94E27DA44404E0107410D235CC038CD9C842ADE44401FEED2D5B8225CC091A7DE51ABE844400B2D03432C235CC0FEDD4B357BE94440E4256B03D8225CC04B0916793AEB44403B9D73F853235CC09968930EE9EB4440EC1D59282B235CC0D58EE2FFA5EE44408AFD09A26A225CC0972EDA9D5FEF4440884330A240225CC02948E59D25F64440ADDE459489225CC08D55BF4AE0F74440588DD6476D245CC02BF0CCC29BF84440DF06C3C58A245CC07D58430390FA444099261236E1235CC02F6EBA4A8CFB4440C4648C3091255CC0175268181E00454033FF29DF4A175CC05816E6050600454001000000020000000002000000FFFFFFFF0000000006000000000000000003
3,257,000-01-02,000-01-02,Yost,Subarea,,,UT,EPSG:4326,0xE610000001041300000081EEF143D65C5CC060FB577CF9FF4440D2E0860E525E5CC0BF247DE27EFC4440E8AF3D31FF5E5CC08EC4B44160F84440896D655B505D5CC0CDFEA3B63BF64440280112FD455B5CC0CF750BE889F64440A87EF049E4595CC0C96126F81AF4444041EA53254A5E5CC0A16397BE32F2444049FC3221235F5CC0D44895978AF044406D44997080625CC0521BDC3D9AEE4440AF855AB2F4635CC03C2FA84543EE4440F9F7CFD7FE655CC076E1E64BEBEF4440E2E2EA9BD96A5CC077DDACCE27F04440F399ECBC286B5CC0D8327AE59AF24440180D3281506D5CC039AD99F636F44440911D87999D6E5CC0280AF08A23F94440DBA58732836C5CC0A22EDA99C6FB4440FA00375A856D5CC03CAB853EC5FD4440643CFCA5116D5CC0C8C47F56C7FF444081EEF143D65C5CC060FB577CF9FF444001000000020000000002000000FFFFFFFF0000000006000000000000000003
4,258,000-02-00,000-02-00,Goose Creek,Subarea,,,UT,EPSG:4326,0xE6100000010414000000F100619519775CC0AA879D289FFF4440587A72D36D745CC059B6716F5DF944407E30533161755CC04F8E98801BF84440C0A8B7220F765CC09B13224F84F844404E046CAC9E775CC017777E2534F6444013BE798B67775CC0EE9E7FDD14F2444079B0D84585785CC06B7669C6EEF04440672C885AF7785CC04FA8D6EA82EE4440D6792A9E717C5CC03D999CBBD1EB4440754DA2AACD7B5CC03A1139583CEA444050F6F16D4A7C5CC0D1AB491326E94440ED41F89C447C5CC0F49B66B597E644401A3D8ECF337D5CC0DB61862836E644407AABF18D5A7D5CC0607FC67825E5444076A612C919805CC0C595F03992E64440852A85953B805CC0335B547F6DE5444070A797B76C815CC06AA615042BE44440A89C2BCD7B825CC0134AE0D936E54440B79B552A80825CC06C4F1BF868FF4440F100619519775CC0AA879D289FFF444001000000020000000002000000FFFFFFFF0000000006000000000000000003
...,...,...,...,...,...,...,...,...,...,...
1214,24123,CAag_RU547,DAU40323,Upper Russian,Detailed Analysis Units by County,,,CA,EPSG:4326,
1215,24124,CAag_RU548,DAU40423,Middle Russian,Detailed Analysis Units by County,,,CA,EPSG:4326,
1216,24125,CAag_RU549,DAU40449,Middle Russian,Detailed Analysis Units by County,,,CA,EPSG:4326,
1217,24126,CAag_RU550,DAU40523,Dry Creek,Detailed Analysis Units by County,,,CA,EPSG:4326,


In [4]:
#removeing bad rows of df because they are null for some odd reason
df_RU = df_RU[(df_RU.ReportingUnitUUID != 'test')]

#Dropping fields we don't need.
df_RU = df_RU.drop(['Geometry', 'ReportingUnitUpdateDate', 'ReportingUnitProductVersion'], axis=1)

#Fixing Index
df_RU = df_RU.reset_index()
df_RU = df_RU.drop(columns=['index'])

df_RU

Unnamed: 0,ReportingUnitID,ReportingUnitUUID,ReportingUnitNativeID,ReportingUnitName,ReportingUnitTypeCV,StateCV,EPSGCodeCV
0,254,00-01-03,00-01-03,Curlew Valley,Subarea,UT,EPSG:4326
1,255,000-01-03,000-01-03,Clear Creek,Subarea,UT,EPSG:4326
2,256,00-07-02,00-07-02,Promontory Point,Subarea,UT,EPSG:4326
3,257,000-01-02,000-01-02,Yost,Subarea,UT,EPSG:4326
4,258,000-02-00,000-02-00,Goose Creek,Subarea,UT,EPSG:4326
...,...,...,...,...,...,...,...
1214,24123,CAag_RU547,DAU40323,Upper Russian,Detailed Analysis Units by County,CA,EPSG:4326
1215,24124,CAag_RU548,DAU40423,Middle Russian,Detailed Analysis Units by County,CA,EPSG:4326
1216,24125,CAag_RU549,DAU40449,Middle Russian,Detailed Analysis Units by County,CA,EPSG:4326
1217,24126,CAag_RU550,DAU40523,Dry Creek,Detailed Analysis Units by County,CA,EPSG:4326


In [5]:
#Creating Linking Element with Shape File.
#Format Ex: "1_" + !NAME! + "_" + !StateNum!

stateNumDic = {
    "UT" : "46",
    "NM" : "35",
    "NE" : "41",
    "CO" : "41",
    "WY" : "47",
    "CA" : "49",
    "AZ" : "48",
    "TX" : "37",
    "US" : "100"
}

def createTypeNameNum(colrowValueA, colrowValueB, colrowValueC):
    StringA = colrowValueA.strip()  # ReportingUnitTypeCV
    StringB = colrowValueB.strip()  # ReportingUnitName
    StringC = colrowValueC.strip()  # state for StateNumDic
    
    outStringA = ""
    
    if StringA == "County":
        outStringA = "1"
    if StringA == "HUC8":
        outStringA = "2"
    if StringA == "Basin":
        outStringA = "3"
    if StringA == "Planning Area":
        outStringA = "4"
    if StringA == "Active Management Area":
        outStringA = "5"
    if StringA == "Tributary":
        outStringA = "6"
    if StringA == "Subarea":
        outStringA = "7"
    if StringA == "Hydrologic Region":
        outStringA = "8"
    
    outList = outStringA + "_" + StringB + "_" + stateNumDic[StringC]
    return outList

df_RU['TypeNameNum'] = df_RU.apply(lambda row: createTypeNameNum(row['ReportingUnitTypeCV'], row['ReportingUnitName'], row['StateCV']), axis=1)
df_RU

Unnamed: 0,ReportingUnitID,ReportingUnitUUID,ReportingUnitNativeID,ReportingUnitName,ReportingUnitTypeCV,StateCV,EPSGCodeCV,TypeNameNum
0,254,00-01-03,00-01-03,Curlew Valley,Subarea,UT,EPSG:4326,7_Curlew Valley_46
1,255,000-01-03,000-01-03,Clear Creek,Subarea,UT,EPSG:4326,7_Clear Creek_46
2,256,00-07-02,00-07-02,Promontory Point,Subarea,UT,EPSG:4326,7_Promontory Point_46
3,257,000-01-02,000-01-02,Yost,Subarea,UT,EPSG:4326,7_Yost_46
4,258,000-02-00,000-02-00,Goose Creek,Subarea,UT,EPSG:4326,7_Goose Creek_46
...,...,...,...,...,...,...,...,...
1214,24123,CAag_RU547,DAU40323,Upper Russian,Detailed Analysis Units by County,CA,EPSG:4326,_Upper Russian_49
1215,24124,CAag_RU548,DAU40423,Middle Russian,Detailed Analysis Units by County,CA,EPSG:4326,_Middle Russian_49
1216,24125,CAag_RU549,DAU40449,Middle Russian,Detailed Analysis Units by County,CA,EPSG:4326,_Middle Russian_49
1217,24126,CAag_RU550,DAU40523,Dry Creek,Detailed Analysis Units by County,CA,EPSG:4326,_Dry Creek_49


In [6]:
#Creating Linking Element with Shape File.
#Format Ex: "1_" + !ID! + "_" + !StateNum!

stateNumDic = {
    "UT" : "46",
    "NM" : "35",
    "NE" : "41",
    "CO" : "42",
    "WY" : "47",
    "CA" : "49",
    "AZ" : "48",
    "TX" : "37",
    "US" : "100"
}


def createTypeNameNum(colrowValueA, colrowValueB, colrowValueC):
    StringA = colrowValueA.strip()  # ReportingUnitTypeCV
    StringB = colrowValueB.strip()  # ReportingUnitNativeID
    StringC = colrowValueC.strip()  # state for StateNumDic
    
    outStringA = ""
    
    if StringA == "County":
        outStringA = "1"
    if StringA == "HUC8":
        outStringA = "2"
    if StringA == "Basin":
        outStringA = "3"
    if StringA == "Planning Area":
        outStringA = "4"
    if StringA == "Active Management Area":
        outStringA = "5"
    if StringA == "Tributary":
        outStringA = "6"
    if StringA == "Subarea":
        outStringA = "7"
    if StringA == "Hydrologic Region":
        outStringA = "8"
    
    outList = outStringA + "_" + StringB + "_" + stateNumDic[StringC]
    return outList

df_RU['TypeIDNum'] = df_RU.apply(lambda row: createTypeNameNum(row['ReportingUnitTypeCV'], row['ReportingUnitNativeID'], row['StateCV']), axis=1)
df_RU

Unnamed: 0,ReportingUnitID,ReportingUnitUUID,ReportingUnitNativeID,ReportingUnitName,ReportingUnitTypeCV,StateCV,EPSGCodeCV,TypeNameNum,TypeIDNum
0,254,00-01-03,00-01-03,Curlew Valley,Subarea,UT,EPSG:4326,7_Curlew Valley_46,7_00-01-03_46
1,255,000-01-03,000-01-03,Clear Creek,Subarea,UT,EPSG:4326,7_Clear Creek_46,7_000-01-03_46
2,256,00-07-02,00-07-02,Promontory Point,Subarea,UT,EPSG:4326,7_Promontory Point_46,7_00-07-02_46
3,257,000-01-02,000-01-02,Yost,Subarea,UT,EPSG:4326,7_Yost_46,7_000-01-02_46
4,258,000-02-00,000-02-00,Goose Creek,Subarea,UT,EPSG:4326,7_Goose Creek_46,7_000-02-00_46
...,...,...,...,...,...,...,...,...,...
1214,24123,CAag_RU547,DAU40323,Upper Russian,Detailed Analysis Units by County,CA,EPSG:4326,_Upper Russian_49,_DAU40323_49
1215,24124,CAag_RU548,DAU40423,Middle Russian,Detailed Analysis Units by County,CA,EPSG:4326,_Middle Russian_49,_DAU40423_49
1216,24125,CAag_RU549,DAU40449,Middle Russian,Detailed Analysis Units by County,CA,EPSG:4326,_Middle Russian_49,_DAU40449_49
1217,24126,CAag_RU550,DAU40523,Dry Creek,Detailed Analysis Units by County,CA,EPSG:4326,_Dry Creek_49,_DAU40523_49


In [7]:
#The Ouput
df_RU.to_csv('Pagg_ReportingUnit.csv', index=False)

## AggregatedAmounts_withOrg

In [8]:
AAwO_Input = "dontopen_AggLJAll.csv"
df_AAwO = pd.read_csv(AAwO_Input)
df_AAwO.head(5)

  interactivity=interactivity, compiler=compiler, result=result)


Unnamed: 0,AggregatedAmountID,OrganizationID,ReportingUnitID,VariableSpecificID,WaterSourceID,MethodID,ReportYearCV,Amount,TimeframeStartID,TimeframeEndID,ReportingUnitUUID,ReportingUnitNativeID,ReportingUnitName,ReportingUnitTypeCV,TimeframeStartID.1,TimeframeEndID.1,WaterSourceTypeCV,VariableCV,VariableSpecificCV,ApplicableResourceTypeCV,MethodTypeCV,State,BeneficialUseCV
0,554278,3,254,115,338412,36,2013,0.0,1.0,42.0,00-01-03,00-01-03,Curlew Valley,Subarea,1.0,42.0,Unspecified,Withdrawal,"Withdrawal, Public Supply",Modeled,Modeled,UT,Municipal/Industrial
1,554447,3,254,115,338412,36,2006,3.0,6.0,28.0,00-01-03,00-01-03,Curlew Valley,Subarea,6.0,28.0,Unspecified,Withdrawal,"Withdrawal, Public Supply",Modeled,Modeled,UT,Municipal/Industrial
2,554616,3,254,115,338412,36,2012,0.0,8.0,30.0,00-01-03,00-01-03,Curlew Valley,Subarea,8.0,30.0,Unspecified,Withdrawal,"Withdrawal, Public Supply",Modeled,Modeled,UT,Municipal/Industrial
3,554785,3,254,115,338412,36,2010,0.0,25.0,43.0,00-01-03,00-01-03,Curlew Valley,Subarea,25.0,43.0,Unspecified,Withdrawal,"Withdrawal, Public Supply",Modeled,Modeled,UT,Municipal/Industrial
4,554954,3,254,115,338412,36,2008,3.0,27.0,40.0,00-01-03,00-01-03,Curlew Valley,Subarea,27.0,40.0,Unspecified,Withdrawal,"Withdrawal, Public Supply",Modeled,Modeled,UT,Municipal/Industrial


In [9]:
date_Input = "dontopen_Date_dim.csv"
df_date = pd.read_csv(date_Input)
df_date.head(5)

Unnamed: 0,DateID,Date,Year
0,1,2013-10-01,2013
1,2,2003-09-01,2003
2,3,2003-10-01,2003
3,4,2006-08-01,2006
4,5,2006-09-01,2006


In [10]:
#Retreiving TimeframeStart from Dates_dim Table
TimeframeStartdict = pd.Series(df_date.Date.values, index = df_date.DateID).to_dict()

# For creating TimeframeStart
def retrieveTimeframeStart(colrowValue):
    if colrowValue == '' or pd.isnull(colrowValue):
        outList = ''
    else:
        val1 = colrowValue
        try:
            outList = TimeframeStartdict[val1]
        except:
            outList = colrowValue
    return outList

df_AAwO['TimeframeStart'] = df_AAwO.apply(lambda row: retrieveTimeframeStart(row['TimeframeStartID']), axis=1)
df_AAwO.head(5)

Unnamed: 0,AggregatedAmountID,OrganizationID,ReportingUnitID,VariableSpecificID,WaterSourceID,MethodID,ReportYearCV,Amount,TimeframeStartID,TimeframeEndID,ReportingUnitUUID,ReportingUnitNativeID,ReportingUnitName,ReportingUnitTypeCV,TimeframeStartID.1,TimeframeEndID.1,WaterSourceTypeCV,VariableCV,VariableSpecificCV,ApplicableResourceTypeCV,MethodTypeCV,State,BeneficialUseCV,TimeframeStart
0,554278,3,254,115,338412,36,2013,0.0,1.0,42.0,00-01-03,00-01-03,Curlew Valley,Subarea,1.0,42.0,Unspecified,Withdrawal,"Withdrawal, Public Supply",Modeled,Modeled,UT,Municipal/Industrial,2013-10-01
1,554447,3,254,115,338412,36,2006,3.0,6.0,28.0,00-01-03,00-01-03,Curlew Valley,Subarea,6.0,28.0,Unspecified,Withdrawal,"Withdrawal, Public Supply",Modeled,Modeled,UT,Municipal/Industrial,2006-10-01
2,554616,3,254,115,338412,36,2012,0.0,8.0,30.0,00-01-03,00-01-03,Curlew Valley,Subarea,8.0,30.0,Unspecified,Withdrawal,"Withdrawal, Public Supply",Modeled,Modeled,UT,Municipal/Industrial,2012-10-01
3,554785,3,254,115,338412,36,2010,0.0,25.0,43.0,00-01-03,00-01-03,Curlew Valley,Subarea,25.0,43.0,Unspecified,Withdrawal,"Withdrawal, Public Supply",Modeled,Modeled,UT,Municipal/Industrial,2010-10-01
4,554954,3,254,115,338412,36,2008,3.0,27.0,40.0,00-01-03,00-01-03,Curlew Valley,Subarea,27.0,40.0,Unspecified,Withdrawal,"Withdrawal, Public Supply",Modeled,Modeled,UT,Municipal/Industrial,2008-10-01


In [11]:
#Retreiving TimeframeEnd from Dates_dim Table
TimeframeEnddict = pd.Series(df_date.Date.values, index = df_date.DateID).to_dict()

# For creating TimeframeEnd
def retrieveTimeframeEnd(colrowValue):
    if colrowValue == '' or pd.isnull(colrowValue):
        outList = ''
    else:
        val1 = colrowValue
        try:
            outList = TimeframeEnddict[val1]
        except:
            outList = colrowValue
    return outList

df_AAwO['TimeframeEnd'] = df_AAwO.apply(lambda row: retrieveTimeframeEnd(row['TimeframeEndID']), axis=1)
df_AAwO.head(5)

Unnamed: 0,AggregatedAmountID,OrganizationID,ReportingUnitID,VariableSpecificID,WaterSourceID,MethodID,ReportYearCV,Amount,TimeframeStartID,TimeframeEndID,ReportingUnitUUID,ReportingUnitNativeID,ReportingUnitName,ReportingUnitTypeCV,TimeframeStartID.1,TimeframeEndID.1,WaterSourceTypeCV,VariableCV,VariableSpecificCV,ApplicableResourceTypeCV,MethodTypeCV,State,BeneficialUseCV,TimeframeStart,TimeframeEnd
0,554278,3,254,115,338412,36,2013,0.0,1.0,42.0,00-01-03,00-01-03,Curlew Valley,Subarea,1.0,42.0,Unspecified,Withdrawal,"Withdrawal, Public Supply",Modeled,Modeled,UT,Municipal/Industrial,2013-10-01,2013-09-30
1,554447,3,254,115,338412,36,2006,3.0,6.0,28.0,00-01-03,00-01-03,Curlew Valley,Subarea,6.0,28.0,Unspecified,Withdrawal,"Withdrawal, Public Supply",Modeled,Modeled,UT,Municipal/Industrial,2006-10-01,2006-09-30
2,554616,3,254,115,338412,36,2012,0.0,8.0,30.0,00-01-03,00-01-03,Curlew Valley,Subarea,8.0,30.0,Unspecified,Withdrawal,"Withdrawal, Public Supply",Modeled,Modeled,UT,Municipal/Industrial,2012-10-01,2012-09-30
3,554785,3,254,115,338412,36,2010,0.0,25.0,43.0,00-01-03,00-01-03,Curlew Valley,Subarea,25.0,43.0,Unspecified,Withdrawal,"Withdrawal, Public Supply",Modeled,Modeled,UT,Municipal/Industrial,2010-10-01,2010-09-30
4,554954,3,254,115,338412,36,2008,3.0,27.0,40.0,00-01-03,00-01-03,Curlew Valley,Subarea,27.0,40.0,Unspecified,Withdrawal,"Withdrawal, Public Supply",Modeled,Modeled,UT,Municipal/Industrial,2008-10-01,2008-09-30


In [12]:
#Dropping fields we don't need.
df_AAwO = df_AAwO.drop(['TimeframeStartID', 'TimeframeEndID'], axis=1)
df_AAwO.head(5)

Unnamed: 0,AggregatedAmountID,OrganizationID,ReportingUnitID,VariableSpecificID,WaterSourceID,MethodID,ReportYearCV,Amount,ReportingUnitUUID,ReportingUnitNativeID,ReportingUnitName,ReportingUnitTypeCV,TimeframeStartID.1,TimeframeEndID.1,WaterSourceTypeCV,VariableCV,VariableSpecificCV,ApplicableResourceTypeCV,MethodTypeCV,State,BeneficialUseCV,TimeframeStart,TimeframeEnd
0,554278,3,254,115,338412,36,2013,0.0,00-01-03,00-01-03,Curlew Valley,Subarea,1.0,42.0,Unspecified,Withdrawal,"Withdrawal, Public Supply",Modeled,Modeled,UT,Municipal/Industrial,2013-10-01,2013-09-30
1,554447,3,254,115,338412,36,2006,3.0,00-01-03,00-01-03,Curlew Valley,Subarea,6.0,28.0,Unspecified,Withdrawal,"Withdrawal, Public Supply",Modeled,Modeled,UT,Municipal/Industrial,2006-10-01,2006-09-30
2,554616,3,254,115,338412,36,2012,0.0,00-01-03,00-01-03,Curlew Valley,Subarea,8.0,30.0,Unspecified,Withdrawal,"Withdrawal, Public Supply",Modeled,Modeled,UT,Municipal/Industrial,2012-10-01,2012-09-30
3,554785,3,254,115,338412,36,2010,0.0,00-01-03,00-01-03,Curlew Valley,Subarea,25.0,43.0,Unspecified,Withdrawal,"Withdrawal, Public Supply",Modeled,Modeled,UT,Municipal/Industrial,2010-10-01,2010-09-30
4,554954,3,254,115,338412,36,2008,3.0,00-01-03,00-01-03,Curlew Valley,Subarea,27.0,40.0,Unspecified,Withdrawal,"Withdrawal, Public Supply",Modeled,Modeled,UT,Municipal/Industrial,2008-10-01,2008-09-30


In [13]:
df_AAwO['BeneficialUseCV'].fillna('Unspecified', inplace=True)
df_AAwO

Unnamed: 0,AggregatedAmountID,OrganizationID,ReportingUnitID,VariableSpecificID,WaterSourceID,MethodID,ReportYearCV,Amount,ReportingUnitUUID,ReportingUnitNativeID,ReportingUnitName,ReportingUnitTypeCV,TimeframeStartID.1,TimeframeEndID.1,WaterSourceTypeCV,VariableCV,VariableSpecificCV,ApplicableResourceTypeCV,MethodTypeCV,State,BeneficialUseCV,TimeframeStart,TimeframeEnd
0,554278,3,254,115,338412,36,2013,0.0,00-01-03,00-01-03,Curlew Valley,Subarea,1.0,42.0,Unspecified,Withdrawal,"Withdrawal, Public Supply",Modeled,Modeled,UT,Municipal/Industrial,2013-10-01,2013-09-30
1,554447,3,254,115,338412,36,2006,3.0,00-01-03,00-01-03,Curlew Valley,Subarea,6.0,28.0,Unspecified,Withdrawal,"Withdrawal, Public Supply",Modeled,Modeled,UT,Municipal/Industrial,2006-10-01,2006-09-30
2,554616,3,254,115,338412,36,2012,0.0,00-01-03,00-01-03,Curlew Valley,Subarea,8.0,30.0,Unspecified,Withdrawal,"Withdrawal, Public Supply",Modeled,Modeled,UT,Municipal/Industrial,2012-10-01,2012-09-30
3,554785,3,254,115,338412,36,2010,0.0,00-01-03,00-01-03,Curlew Valley,Subarea,25.0,43.0,Unspecified,Withdrawal,"Withdrawal, Public Supply",Modeled,Modeled,UT,Municipal/Industrial,2010-10-01,2010-09-30
4,554954,3,254,115,338412,36,2008,3.0,00-01-03,00-01-03,Curlew Valley,Subarea,27.0,40.0,Unspecified,Withdrawal,"Withdrawal, Public Supply",Modeled,Modeled,UT,Municipal/Industrial,2008-10-01,2008-09-30
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
116057,577758,8,23025,72,462891,61,2007,442.0,TX_9,9,TRINITY-SAN JACINTO,Basin,37752.0,36160.0,Surface,Consumptive Use,Consumptive Use,Unspecified,Water Use,TX,Steam-Electric Power_surface,2020-01-01,2020-12-31
116058,577781,8,23025,72,462891,61,2007,2578.0,TX_9,9,TRINITY-SAN JACINTO,Basin,37752.0,36160.0,Surface,Consumptive Use,Consumptive Use,Unspecified,Water Use,TX,Mining_surface,2020-01-01,2020-12-31
116059,577804,8,23025,72,462891,61,2007,79975.0,TX_9,9,TRINITY-SAN JACINTO,Basin,37752.0,36160.0,Surface,Consumptive Use,Consumptive Use,Unspecified,Water Use,TX,Manufacturing_surface,2020-01-01,2020-12-31
116060,577827,8,23025,72,462891,61,2007,473342.0,TX_9,9,TRINITY-SAN JACINTO,Basin,37752.0,36160.0,Surface,Consumptive Use,Consumptive Use,Unspecified,Water Use,TX,Municipal_surface,2020-01-01,2020-12-31


In [14]:
#The Ouput
df_AAwO.to_csv('Pagg_AggregatedAmountsAll.csv', index=False)