# Demo: data analysis

This version handles the **AERONET AOD and INV data**. Further versions will include MODIS observations and comparisons with model results.

In [1]:
import warnings
warnings.filterwarnings('ignore')

In [2]:
%matplotlib inline
%load_ext autoreload
%autoreload 2

In [3]:
import os
import aerosol_obs_analysis as ana

import datetime
time_range = [datetime.date(2006,1,1),datetime.date(2006,12,31)]

## Download the data

Data files can be downloaded from the [AERONET website](https://aeronet.gsfc.nasa.gov/). This version only deals with the daily data. The all site data can be downloaded here: [AOD](https://aeronet.gsfc.nasa.gov/data_push/V3/AOD/AOD_Level20_Daily_V3.tar.gz)(~ 63 MB, 2019/12/12), [INV](https://aeronet.gsfc.nasa.gov/data_push/V3/INV/INV_Level2_Daily_V3.tar.gz)(~130 MB, 2019/12/12). You may also use the download tool on the AERONET homepage to specify the sites, dates and products. 

## Read data files and basic analysis

### All points data

Here, the data files are saved in the `data/all` directory

In [7]:
os.listdir('data/all')

['AOD_Level20_Daily_V3.tar.gz', 'INV_Level2_Daily_V3.tar.gz']

Start analysis by creating an `aeronet` object. You can specify the workspace path with `pickle_path='XXX'`, and the time range with `time_range=[start,end].

In [4]:
aeronet = ana.aeronet(time_range = time_range)

AERONET data object created. Use .read_pickle() to load data from pickle, or .analysis_guide() to read raw data.


Data can be loaded from saved pickle:

In [5]:
aeronet.load_pickle(pickle_name='COM20_20060101_20061231.pkl')

Loading AERONET data from pickle: COM20_20060101_20061231.pkl


For the first time reading the all site data, the option `.analysis_guide()` is recommended. The guide is used to perform analysis from raw or half processed data. Note the instructions for entering 'YES' and 'NO'

It is divided into three parts:
    1. Basic analysis of AOD data
    2. Basic analysis of INV data
    3. Combine AOD and INV data into one file

You may choose to save the intermediate files for later use. See appendix for details.


Here demostrate the analysis of starting from extracting tar.

In [6]:
aeronet.analysis_guide()


##### Analysis from raw/half-processed data #####
Load AOD data (all records) from pickle? 
Press enter for YES, other keys for NO: NO

#### Load AERONET AOD all site data from raw data ####
Enter directory of raw files? 
Press enter for default (./): ./data/all/
Extract .tar? Type "YES" for yes: YES
Name of .tar file? AOD_Level20_Daily_V3.tar.gz
Extracting tar file: ./data/all/AOD_Level20_Daily_V3.tar.gz at ./data/all/
Finish extracting tar file
Path for files? 
Press enter for default (./data/all/AOD/AOD20/DAILY/): 
Loading AERONET AOD all site data from ./data/all/AOD/AOD20/DAILY/
Skim data? 
Press enter for YES, other keys for NO: 

Finished extracting AOD data from raw data
Save pickle? 
Press enter for YES, other keys for NO: 
Pickle name to save? 
Press enter for default (AOD20_daily_all_sites.pkl): 

Saved df_aod_all as workspace/AERONET/AOD20_daily_all_sites.pkl 
************
Load INV df (all records) from pickle? 
Press enter for YES, other keys for NO: NO

#### Load AERONET

Alternatively, you can also load the all record data from pickle, and filter another time range or sites.

In [22]:
time_range2 = [datetime.date(2016,1,1),datetime.date(2016,12,31)]
aeronet2 = ana.aeronet(time_range = time_range2)

aeronet2.analysis_guide()

AERONET data object created. Use .read_pickle() to load data from pickle, or .analysis_guide() to read raw data.

##### Analysis from raw/half-processed data #####
Load AOD data (all records) from pickle? 
Press enter for YES, other keys for NO: 
Pickle name? 
Press enter for default (AOD20_daily_all_sites.pkl): 
Loading AERONET AOD all site data from pickle: workspace/AERONET/AOD20_daily_all_sites.pkl
Loaded AERONET AOD all site data from pickle: workspace/AERONET/AOD20_daily_all_sites.pkl
************
Load INV df (all records) from pickle? 
Press enter for YES, other keys for NO: 
Pickle name? 
Press enter for default (INV20_daily_all_sites.pkl): 
Loading AERONET INV all site data from pickle: workspace/AERONET/INV20_daily_all_sites.pkl
Loaded AERONET INV all site data from pickle: workspace/AERONET/INV20_daily_all_sites.pkl
************
Combine AOD and INV df? 
Press enter for YES, other keys for NO: 
Filter time range? 
Press enter for YES, other keys for NO: 
Time range: [datetime

## Data

The data are stored in `aeronet.df`

In [47]:
aeronet.df.head()

Unnamed: 0,AERONET_Site_Name,Date(dd:mm:yyyy),Day_of_Year,Site_Latitude(Degrees),Site_Longitude(Degrees),AOD_500nm,440-870_Angstrom_Exponent,REff-C,REff-F,REff-T,...,Asymmetry_Factor-Total[440nm],Asymmetry_Factor-Total[675nm],Asymmetry_Factor-Total[870nm],Absorption_AOD[440nm],Absorption_AOD[675nm],Absorption_AOD[870nm],Single_Scattering_Albedo[440nm],Single_Scattering_Albedo[675nm],Single_Scattering_Albedo[870nm],dV/dlnr
1720,Abu_Al_Bukhoosh,2006-11-21,325,25.495,53.145833,,0.61179,,,,...,,,,,,,,,,
1721,Abu_Al_Bukhoosh,2006-11-22,326,25.495,53.145833,,1.124698,,,,...,,,,,,,,,,
1722,Abu_Al_Bukhoosh,2006-11-23,327,25.495,53.145833,,1.326807,,,,...,,,,,,,,,,
1723,Abu_Al_Bukhoosh,2006-11-24,328,25.495,53.145833,,1.202353,,,,...,,,,,,,,,,
1724,Abu_Al_Bukhoosh,2006-11-25,329,25.495,53.145833,,1.383928,,,,...,,,,,,,,,,


In [49]:
aeronet.df.columns

Index(['AERONET_Site_Name', 'Date(dd:mm:yyyy)', 'Day_of_Year',
       'Site_Latitude(Degrees)', 'Site_Longitude(Degrees)', 'AOD_500nm',
       '440-870_Angstrom_Exponent', 'REff-C', 'REff-F', 'REff-T', 'Bin 1',
       'Bin 2', 'Bin 3', 'Bin 4', 'Bin 5', 'Bin 6', 'Bin 7', 'Bin 8', 'Bin 9',
       'Bin 10', 'Bin 11', 'Bin 12', 'Bin 13', 'Bin 14', 'Bin 15', 'Bin 16',
       'Bin 17', 'Bin 18', 'Bin 19', 'Bin 20', 'Bin 21', 'Bin 22',
       'AOD_Extinction-Coarse[440nm]', 'AOD_Extinction-Coarse[675nm]',
       'AOD_Extinction-Coarse[870nm]', 'AOD_Extinction-Fine[440nm]',
       'AOD_Extinction-Fine[675nm]', 'AOD_Extinction-Fine[870nm]',
       'AOD_Extinction-Total[440nm]', 'AOD_Extinction-Total[675nm]',
       'AOD_Extinction-Total[870nm]', 'Asymmetry_Factor-Coarse[440nm]',
       'Asymmetry_Factor-Coarse[675nm]', 'Asymmetry_Factor-Coarse[870nm]',
       'Asymmetry_Factor-Fine[440nm]', 'Asymmetry_Factor-Fine[675nm]',
       'Asymmetry_Factor-Fine[870nm]', 'Asymmetry_Factor-Total[440nm]',


## Calculating averages

If `min_rec` is specified, the sites with less than the given value of records will not be included. The days for calculating the average is also stored.

In [6]:
avg_aod = aeronet.cal_average(columns=['AOD_500nm'],min_rec=30)
avg_aod.head()

Unnamed: 0_level_0,Day_of_Year,Site_Latitude(Degrees),Site_Longitude(Degrees),AOD_500nm,Record_number
AERONET_Site_Name,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1
Adelaide_Site_7,"[94, 95, 97, 98, 99, 101, 102, 103, 105, 106, ...",-34.725067,138.656483,0.064155,194
Al_Ain,"[229, 235, 236, 237, 238, 239, 240, 241, 242, ...",24.242167,55.705167,0.331605,31
Alta_Floresta,"[3, 9, 13, 17, 18, 19, 20, 21, 23, 25, 27, 28,...",-9.871339,-56.104453,0.342905,174
Ames,"[4, 9, 10, 11, 12, 13, 14, 15, 18, 23, 24, 26,...",42.021361,-93.774778,0.184697,185
Andenes,"[188, 192, 209, 210, 211, 213, 217, 224, 225, ...",69.278333,16.008611,0.099436,36


## Searching sites

In [24]:
aeronet.select_sites(lat=[0,30],lon=[0,10])

['Banizoumbou',
 'Djougou',
 'Ilorin',
 'Niamey',
 'Tamanrasset_INM',
 'Tamanrasset_TMP']

### Appendix: Flow of questions in analysis guide

**1. Basic analysis of AOD data**

Q1. Load AOD data (all records) from pickle? 
e.g. Yes if you have saved a pickle for all record

    No -> Q2       Yes -> part 2

Q2. Directory of raw files?

Q3. Extract .tar?

    Yes -> Q3      No -> Q5
    
Q4. Name of .tar file?

Q5. Path for files? 

Q6. Skim data? (Keeping only `AOD_500nm` and `440-870_Angstrom_Exponent` data)

    Yes -> Q7      No -> Q7

Q7. Save pickle? 

    Yes -> Q8      No -> Q9

Q8. Pickle name to save? 

**2. Basic analysis of INV data**

Similar to part 1, but `skim_data=True` keeps only the size distribution.

**3. Combine AOD and INV data into one file**

Q9. Combine AOD and INV df? 

    Yes -> Q10     No -> exit

Q10. Filter time range? (Time range is specified at object initialization)
    
Q11. Filter site?

    Yes -> Q12     No -> Q13
    
Q12. Site name?    

Q13. Save pickle? 

    Yes -> Q14      No -> exit

Q14. Pickle name to save? 