# Comparison between "Comprehensive upper-air observation network from 1905 to present" and "Insitu IGRA radiosoundings baseline network"

**Contains modified Copernicus Climate Change Service Information 2020**
under [License](https://apps.ecmwf.int/datasets/licences/copernicus/)

*Copernicus Climate Change Service (C3S) - Upper Air Data Service (2020)*

The purpose of this IPython Notebook is to find differences in both mentioned data sets and to compare them. 

    Author: U. Voggenberger
    Date: 10.2020
    Contact: ulrich.voggenberger [at] univie.ac.at
    License: C3S, 2020


In [1]:
import pandas
import numpy as np
import sys, zipfile, os, time
import matplotlib.pyplot as plt
import glob
import datetime
import urllib3
import cdsapi

In [2]:
import matplotlib.pylab as pylab
params = {'legend.fontsize': 'x-large',
          'figure.figsize': (16, 10),
         'axes.labelsize': 20,
         'axes.titlesize': 24,
         'xtick.labelsize':'medium',
         'ytick.labelsize':'medium'}
pylab.rcParams.update(params)

# Comparing requesting time for IGRA (and its harmonized version) and Comprehensive upper-air observation network (COMP)

### All stations, one whole month : 1989-01-01

We messure the time, which passes while sending the request, calculating the data and downloading it from the CDS.

## IGRA

In [21]:
t0 = time.time()

c = cdsapi.Client()
r = c.retrieve(
    'insitu-observations-igra-baseline-network',
    {
        'source': 'IGRA_H',
        'area': [
            90, -180, -90,
            180,
        ],
        'format': 'csv-lev.zip',
        'variable': 'air_temperature',
        'day': [
            '01', '02', '03',
            '04', '05', '06',
            '07', '08', '09',
            '10', '11', '12',
            '13', '14', '15',
            '16', '17', '18',
            '19', '20', '21',
            '22', '23', '24',
            '25', '26', '27',
            '28', '29', '30',
            '31',
        ],
        'month': '01',
        'year': '1989',
    },
    'download.csv-lev.zip')
if True:
    # Start Download
    r.download(target='download.zip')
    # Check file size
    assert os.stat('download.zip').st_size == r.content_length, "Downloaded file is incomplete"
    
print("Time elapsed: ", time.time()-t0, "s")

2020-10-20 14:32:15,877 INFO Welcome to the CDS
2020-10-20 14:32:15,877 INFO Sending request to https://sis-dev.climate.copernicus.eu/api/v2/resources/insitu-observations-igra-baseline-network
2020-10-20 14:32:15,924 INFO Request is queued
2020-10-20 14:32:16,958 INFO Request is running
2020-10-20 14:32:24,189 INFO Request is completed
2020-10-20 14:32:24,189 INFO Downloading http://136.156.132.176/cache-compute-0000/cache/data2/adaptor.insitu_reference.retrieve_test-1603197141.1084538-16035-20-f427ecd0-d8dd-4ce7-b6b1-0fac6172c6df.zip to download.csv-lev.zip (4.4M)
2020-10-20 14:32:24,773 INFO Download rate 7.6M/s   
2020-10-20 14:32:24,853 INFO Downloading http://136.156.132.176/cache-compute-0000/cache/data2/adaptor.insitu_reference.retrieve_test-1603197141.1084538-16035-20-f427ecd0-d8dd-4ce7-b6b1-0fac6172c6df.zip to download.zip (4.4M)
2020-10-20 14:32:24,998 INFO Download rate 30.7M/s  


Time elapsed:  9.253928661346436 s


In [4]:
z = zipfile.ZipFile('download.zip')
print("Unzipping retrieved files")
print(z.namelist())
z.extractall(path='./REQUESTTEST/IGRA')
z.close()
os.remove('download.zip')

Unzipping retrieved files
['IGRA_H_20150101_20150131_global_cdm-lev.csv']


---

## COMP

In [6]:
t0 = time.time()

c = cdsapi.Client()
r = c.retrieve(
    'insitu-comprehensive-upper-air-observation-network',
    {
        'variable': ['air_temperature',],
        'date': ['20150101-20150131'],
        'format': 'csv',
    })
if True:
    # Start Download
    r.download(target='download.zip')
    # Check file size
    assert os.stat('download.zip').st_size == r.content_length, "Downloaded file is incomplete"
    
print("Time elapsed: ", time.time()-t0, "s")

2020-10-20 14:01:54,054 INFO Welcome to the CDS
2020-10-20 14:01:54,054 INFO Sending request to https://sis-dev.climate.copernicus.eu/api/v2/resources/insitu-comprehensive-upper-air-observation-network
2020-10-20 14:01:54,298 INFO Request is queued
2020-10-20 14:01:55,331 INFO Request is running
2020-10-20 14:06:12,556 INFO Request is completed
2020-10-20 14:06:12,596 INFO Downloading http://136.156.132.176/cache-compute-0000/cache/data2/adaptor.comprehensive_upper_air.retrieve-1603195517.5789483-30763-18-b8e7be79-b79e-4ac2-be55-f7ccbf81cb36.zip to download.zip (88.7M)
2020-10-20 14:06:14,552 INFO Download rate 45.4M/s  


Time elapsed:  260.66215658187866 s


In [7]:
z = zipfile.ZipFile('download.zip')
print("Unzipping retrieved files")
print(z.namelist())
z.extractall(path='./REQUESTTEST/COMP')
z.close()
os.remove('download.zip')

Unzipping retrieved files
['temperature.csv']


---

## COMP as .nc

In [9]:
t0 = time.time()

c = cdsapi.Client()
r = c.retrieve(
    'insitu-comprehensive-upper-air-observation-network',
    {
        'variable': ['air_temperature',],
        'date': ['20150101-20150131'],
        'format': 'nc',
    })
if True:
    # Start Download
    r.download(target='download.zip')
    # Check file size
    assert os.stat('download.zip').st_size == r.content_length, "Downloaded file is incomplete"
    
print("Time elapsed: ", time.time()-t0, "s")

2020-10-20 14:07:07,041 INFO Welcome to the CDS
2020-10-20 14:07:07,042 INFO Sending request to https://sis-dev.climate.copernicus.eu/api/v2/resources/insitu-comprehensive-upper-air-observation-network
2020-10-20 14:07:07,109 INFO Request is queued
2020-10-20 14:07:08,141 INFO Request is running
2020-10-20 14:07:56,660 INFO Request is completed
2020-10-20 14:07:56,700 INFO Downloading http://136.156.132.176/cache-compute-0002/cache/data2/adaptor.comprehensive_upper_air.retrieve-1603195666.9085066-26007-21-8606fa00-bb2e-48e9-9645-8025bb42b691.zip to download.zip (145.5M)
2020-10-20 14:08:04,245 INFO Download rate 19.3M/s 


Time elapsed:  57.233633279800415 s


---

# Whats inside the data:

## IGRA

In [66]:
files = glob.glob('REQUESTTEST/IGRA/*.csv')
igra = pandas.read_csv(files[0], header=11)
igra

Unnamed: 0,station_name,report_timestamp,actual_time,report_id,location_longitude,location_latitude,height_of_station_above_sea_level,air_pressure,air_temperature
0,ZZV000ASFR4,2015-01-01 00:00:00+00,2014-12-31 22:48:00+00,19067,-998.8890,-98.8888,-998.8,30000,
1,ZZV000ASFR4,2015-01-01 00:00:00+00,2014-12-31 22:48:00+00,19067,-998.8890,-98.8888,-998.8,94700,
2,ZZV000ASFR4,2015-01-01 00:00:00+00,2014-12-31 22:48:00+00,19067,-998.8890,-98.8888,-998.8,59100,
3,ZZV000ASFR4,2015-01-01 00:00:00+00,2014-12-31 22:48:00+00,19067,-998.8890,-98.8888,-998.8,1110,
4,ZZV000ASFR4,2015-01-01 00:00:00+00,2014-12-31 22:48:00+00,19067,-998.8890,-98.8888,-998.8,20000,
...,...,...,...,...,...,...,...,...,...
1441826,AYM00089571,2015-01-17 00:00:00+00,2015-01-16 23:16:00+00,5705945,77.9672,-68.5744,18.0,27000,224.82
1441827,AYM00089571,2015-01-17 00:00:00+00,2015-01-16 23:16:00+00,5705945,77.9672,-68.5744,18.0,37400,228.39
1441828,AYM00089571,2015-01-17 00:00:00+00,2015-01-16 23:16:00+00,5705945,77.9672,-68.5744,18.0,10000,225.71
1441829,AYM00089571,2015-01-17 00:00:00+00,2015-01-16 23:16:00+00,5705945,77.9672,-68.5744,18.0,5300,233.76


### Stations:

In [67]:
print('Stations requested: ' + str(len(igra.station_name.drop_duplicates())))

Stations requested: 582


### Points in time:

In [68]:
print('Ascents per month: ' + str(31 * 2))
print('datetimes requested: ' + str(len(igra.report_timestamp.drop_duplicates())))

igra_cleaned = igra
igra_cleaned.report_timestamp = pandas.to_datetime(igra_cleaned.report_timestamp, utc=True)
igra_cleaned = igra_cleaned[igra_cleaned.report_timestamp.dt.hour == 12].append(igra_cleaned[igra_cleaned.report_timestamp.dt.hour == 0])
igra_cleaned = igra_cleaned[igra_cleaned.report_timestamp.dt.minute == 0]
igra_cleaned = igra_cleaned[igra_cleaned.report_timestamp.dt.second == 0]

print('cleaned datetimes requested: ' + str(len(igra_cleaned.report_timestamp.drop_duplicates())))

Ascents per month: 62
datetimes requested: 62
cleaned datetimes requested: 62


In [69]:
igra.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 1441831 entries, 0 to 1441830
Data columns (total 9 columns):
 #   Column                             Non-Null Count    Dtype              
---  ------                             --------------    -----              
 0   station_name                       1441831 non-null  object             
 1   report_timestamp                   1441831 non-null  datetime64[ns, UTC]
 2   actual_time                        1441831 non-null  object             
 3   report_id                          1441831 non-null  int64              
 4   location_longitude                 1441831 non-null  float64            
 5   location_latitude                  1441831 non-null  float64            
 6   height_of_station_above_sea_level  1441831 non-null  float64            
 7   air_pressure                       1441831 non-null  int64              
 8   air_temperature                    1391871 non-null  float64            
dtypes: datetime64[ns, UTC](1)

In [70]:
igra_cleaned.info()

<class 'pandas.core.frame.DataFrame'>
Int64Index: 1441831 entries, 3012 to 1441830
Data columns (total 9 columns):
 #   Column                             Non-Null Count    Dtype              
---  ------                             --------------    -----              
 0   station_name                       1441831 non-null  object             
 1   report_timestamp                   1441831 non-null  datetime64[ns, UTC]
 2   actual_time                        1441831 non-null  object             
 3   report_id                          1441831 non-null  int64              
 4   location_longitude                 1441831 non-null  float64            
 5   location_latitude                  1441831 non-null  float64            
 6   height_of_station_above_sea_level  1441831 non-null  float64            
 7   air_pressure                       1441831 non-null  int64              
 8   air_temperature                    1391871 non-null  float64            
dtypes: datetime64[ns, UTC]

In [71]:
igra.air_temperature.isnull().sum()

49960

### Missing data:

In [72]:
print('Missing data: '+ str(100./len(igra)*igra.air_temperature.isnull().sum()) + '%')

Missing data: 3.465038551674919%


## COMP

In [73]:
files = glob.glob('REQUESTTEST/COMP/*.csv')
comp = pandas.read_csv(files[0])
comp

Unnamed: 0,obs_id,lat,lon,plev,ta,time,trajectory_label,statid,statindex
0,0,56.55,3.21,7320.0,212.90,2015-01-01 00:00:00,10000013441,0-20000-0-01400,0
1,1,56.55,3.21,8210.0,212.10,2015-01-01 00:00:00,10000013441,0-20000-0-01400,0
2,2,56.55,3.21,8980.0,215.10,2015-01-01 00:00:00,10000013441,0-20000-0-01400,0
3,3,56.55,3.21,10000.0,211.90,2015-01-01 00:00:00,10000013441,0-20000-0-01400,0
4,4,56.55,3.21,12500.0,216.10,2015-01-01 00:00:00,10000013441,0-20000-0-01400,0
...,...,...,...,...,...,...,...,...,...
11074133,11074133,7.12,125.65,70000.0,284.45,2015-01-31 12:12:00,40098134773,0-20000-0-98753,809
11074134,11074134,7.12,125.65,85000.0,289.95,2015-01-31 12:12:00,40098134773,0-20000-0-98753,809
11074135,11074135,7.12,125.65,92500.0,294.25,2015-01-31 12:12:00,40098134773,0-20000-0-98753,809
11074136,11074136,7.12,125.65,100000.0,298.65,2015-01-31 12:12:00,40098134773,0-20000-0-98753,809


### Stations:

In [74]:
print('Stations requested: ' + str(len(comp.statid.drop_duplicates())))

Stations requested: 810


### Points in Time:

In [75]:
print('Ascents per month: ' + str(31 * 2))
print('datetimes requested: ' + str(len(comp.time.drop_duplicates())))

comp_cleaned = comp
comp_cleaned.time = pandas.to_datetime(comp_cleaned.time, utc=True)
comp_cleaned = comp_cleaned[comp_cleaned.time.dt.hour == 12].append(comp_cleaned[comp_cleaned.time.dt.hour == 0])
comp_cleaned = comp_cleaned[comp_cleaned.time.dt.minute == 0]
comp_cleaned = comp_cleaned[comp_cleaned.time.dt.second == 0]
print('cleaned datetimes requested: ' + str(len(comp_cleaned.time.drop_duplicates())))

Ascents per month: 62
datetimes requested: 9697
cleaned datetimes requested: 62


In [76]:
comp.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 11074138 entries, 0 to 11074137
Data columns (total 9 columns):
 #   Column            Dtype              
---  ------            -----              
 0   obs_id            int64              
 1   lat               float64            
 2   lon               float64            
 3   plev              float64            
 4   ta                float64            
 5   time              datetime64[ns, UTC]
 6   trajectory_label  int64              
 7   statid            object             
 8   statindex         int64              
dtypes: datetime64[ns, UTC](1), float64(4), int64(3), object(1)
memory usage: 760.4+ MB


In [77]:
comp_cleaned.info()

<class 'pandas.core.frame.DataFrame'>
Int64Index: 1751488 entries, 84 to 11074038
Data columns (total 9 columns):
 #   Column            Dtype              
---  ------            -----              
 0   obs_id            int64              
 1   lat               float64            
 2   lon               float64            
 3   plev              float64            
 4   ta                float64            
 5   time              datetime64[ns, UTC]
 6   trajectory_label  int64              
 7   statid            object             
 8   statindex         int64              
dtypes: datetime64[ns, UTC](1), float64(4), int64(3), object(1)
memory usage: 133.6+ MB


In [78]:
comp.ta.isnull().sum()

0

### Missing data:

In [79]:
print('Missing data: '+ str(100./len(comp)*comp.ta.isnull().sum()) + '%')

Missing data: 0.0%


---

# Comparing requesting time for IGRA (and its harmonized version) and Comprehensive upper-air observation network (COMP)

### One station, one whole year : Brisbane (94578) - 2001

We messure the time, which passes while sending the request, calculating the data and downloading it from the CDS.

# IGRA

In [95]:
t0 = time.time()

c = cdsapi.Client()
r = c.retrieve(
    'insitu-observations-igra-baseline-network',
    {
        'source': 'IGRA_H',
        'area': [
            -26, 152, -28,
            154,
        ],
        'format': 'csv-lev.zip',
        'variable': 'air_temperature',
        'day': [
            '01', '02', '03',
            '04', '05', '06',
            '07', '08', '09',
            '10', '11', '12',
            '13', '14', '15',
            '16', '17', '18',
            '19', '20', '21',
            '22', '23', '24',
            '25', '26', '27',
            '28', '29', '30',
            '31',
        ],
        'month': ['01','02','03','04','05','06','07','08','09','10','11','12'],
        'year': '2001',
    },
    'download.csv-lev.zip')
if True:
    # Start Download
    r.download(target='download.zip')
    # Check file size
    assert os.stat('download.zip').st_size == r.content_length, "Downloaded file is incomplete"
    
print("Time elapsed: ", time.time()-t0, "s")

2020-10-21 08:45:54,842 INFO Welcome to the CDS
2020-10-21 08:45:54,843 INFO Sending request to https://sis-dev.climate.copernicus.eu/api/v2/resources/insitu-observations-igra-baseline-network
2020-10-21 08:45:55,010 INFO Request is queued
2020-10-21 08:45:56,044 INFO Request is running
2020-10-21 08:46:03,276 INFO Request is completed
2020-10-21 08:46:03,276 INFO Downloading http://136.156.132.176/cache-compute-0000/cache/data1/adaptor.insitu_reference.retrieve_test-1603262760.124249-13174-8-b2a0e8b6-1254-488a-8a07-0a31c80a5bc6.zip to download.csv-lev.zip (224.8K)
2020-10-21 08:46:03,464 INFO Download rate 1.2M/s 
2020-10-21 08:46:03,520 INFO Downloading http://136.156.132.176/cache-compute-0000/cache/data1/adaptor.insitu_reference.retrieve_test-1603262760.124249-13174-8-b2a0e8b6-1254-488a-8a07-0a31c80a5bc6.zip to download.zip (224.8K)
2020-10-21 08:46:03,603 INFO Download rate 2.7M/s


Time elapsed:  8.914475679397583 s


In [96]:
z = zipfile.ZipFile('download.zip')
print("Unzipping retrieved files")
print(z.namelist())
z.extractall(path='./REQUESTTEST/IGRA2')
z.close()
os.remove('download.zip')

Unzipping retrieved files
['IGRA_H_20010101_20011231_subset_cdm-lev.csv']


In [97]:
files = glob.glob('REQUESTTEST/IGRA2/*.csv')
igra = pandas.read_csv(files[0], header=11)
igra

Unnamed: 0,station_name,report_timestamp,actual_time,report_id,location_longitude,location_latitude,height_of_station_above_sea_level,air_pressure,air_temperature
0,ASM00094578,2001-01-01 11:00:00+00,2001-01-01 11:00:00+00,4556978,153.129,-27.3917,4,92600,287.55
1,ASM00094578,2001-01-01 11:00:00+00,2001-01-01 11:00:00+00,4556978,153.129,-27.3917,4,73500,281.35
2,ASM00094578,2001-01-01 11:00:00+00,2001-01-01 11:00:00+00,4556978,153.129,-27.3917,4,10000,200.66
3,ASM00094578,2001-01-01 11:00:00+00,2001-01-01 11:00:00+00,4556978,153.129,-27.3917,4,16800,213.05
4,ASM00094578,2001-01-01 11:00:00+00,2001-01-01 11:00:00+00,4556978,153.129,-27.3917,4,15000,208.32
...,...,...,...,...,...,...,...,...,...
30060,ASM00094578,2001-12-31 12:00:00+00,2001-12-31 11:25:00+00,4559201,153.129,-27.3917,4,55300,269.51
30061,ASM00094578,2001-12-31 12:00:00+00,2001-12-31 11:25:00+00,4559201,153.129,-27.3917,4,52900,268.45
30062,ASM00094578,2001-12-31 12:00:00+00,2001-12-31 11:25:00+00,4559201,153.129,-27.3917,4,6860,200.25
30063,ASM00094578,2001-12-31 12:00:00+00,2001-12-31 11:25:00+00,4559201,153.129,-27.3917,4,10000,197.26


In [101]:
igra.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 30065 entries, 0 to 30064
Data columns (total 9 columns):
 #   Column                             Non-Null Count  Dtype  
---  ------                             --------------  -----  
 0   station_name                       30065 non-null  object 
 1   report_timestamp                   30065 non-null  object 
 2   actual_time                        30065 non-null  object 
 3   report_id                          30065 non-null  int64  
 4   location_longitude                 30065 non-null  float64
 5   location_latitude                  30065 non-null  float64
 6   height_of_station_above_sea_level  30065 non-null  int64  
 7   air_pressure                       30065 non-null  int64  
 8   air_temperature                    30065 non-null  float64
dtypes: float64(3), int64(3), object(3)
memory usage: 2.1+ MB


### Points in Time:

In [103]:
print('Ascents: ' + str(365 * 2))
print('datetimes requested: ' + str(len(igra.report_timestamp.drop_duplicates())))

igra_cleaned = igra
igra_cleaned.report_timestamp = pandas.to_datetime(igra_cleaned.report_timestamp, utc=True)
igra_cleaned = igra_cleaned[igra_cleaned.report_timestamp.dt.hour == 12].append(igra_cleaned[igra_cleaned.report_timestamp.dt.hour == 0])
igra_cleaned = igra_cleaned[igra_cleaned.report_timestamp.dt.minute == 0]
igra_cleaned = igra_cleaned[igra_cleaned.report_timestamp.dt.second == 0]

print('cleaned datetimes requested: ' + str(len(igra_cleaned.report_timestamp.drop_duplicates())))

Ascents: 730
datetimes requested: 722
cleaned datetimes requested: 654


In [98]:
igra.air_temperature.isnull().sum()

0

### Missing data:

In [105]:
print('Missing data: '+ str(100./len(igra)*igra.air_temperature.isnull().sum()) + '%')

Missing data: 0.0%


# COMP

In [89]:
t0 = time.time()

c = cdsapi.Client()
r = c.retrieve(
    'insitu-comprehensive-upper-air-observation-network',
    {
        'variable': ['air_temperature',],
        'date': ['20010101-20011231'],
        'statid': '94578',
        'format': 'csv',
    })
if True:
    # Start Download
    r.download(target='download.zip')
    # Check file size
    assert os.stat('download.zip').st_size == r.content_length, "Downloaded file is incomplete"
print("Time elapsed: ", time.time()-t0, "s")

2020-10-21 08:41:59,053 INFO Welcome to the CDS
2020-10-21 08:41:59,054 INFO Sending request to https://sis-dev.climate.copernicus.eu/api/v2/resources/insitu-comprehensive-upper-air-observation-network
2020-10-21 08:41:59,334 INFO Request is queued
2020-10-21 08:42:00,367 INFO Request is running
2020-10-21 08:42:04,186 INFO Request is completed
2020-10-21 08:42:04,230 INFO Downloading http://136.156.132.176/cache-compute-0000/cache/data2/adaptor.comprehensive_upper_air.retrieve-1603262521.8786294-13174-7-c9713f65-01de-406d-b515-96e9b982b0f8.zip to download.zip (655.6K)
2020-10-21 08:42:04,520 INFO Download rate 2.2M/s 


Time elapsed:  5.60142707824707 s


In [90]:
z = zipfile.ZipFile('download.zip')
print("Unzipping retrieved files")
print(z.namelist())
z.extractall(path='./REQUESTTEST/COMP2')
z.close()
os.remove('download.zip')

Unzipping retrieved files
['temperature.csv']


In [91]:
files = glob.glob('REQUESTTEST/COMP2/*.csv')
comp = pandas.read_csv(files[0])
comp

Unnamed: 0,obs_id,lat,lon,plev,ta,time,trajectory_label,statid,statindex
0,0,-27.38,153.13,3370.0,215.30,2001-01-01 11:00:00,10000032129,0-20000-0-94578,0
1,1,-27.38,153.13,4200.0,213.70,2001-01-01 11:00:00,10000032129,0-20000-0-94578,0
2,2,-27.38,153.13,4760.0,207.30,2001-01-01 11:00:00,10000032129,0-20000-0-94578,0
3,3,-27.38,153.13,5000.0,208.90,2001-01-01 11:00:00,10000032129,0-20000-0-94578,0
4,4,-27.38,153.13,5180.0,209.30,2001-01-01 11:00:00,10000032129,0-20000-0-94578,0
...,...,...,...,...,...,...,...,...,...
86063,86063,-27.43,153.08,94800.0,296.15,2001-12-31 23:47:00,40064970378,0-20000-0-94578,0
86064,86064,-27.43,153.08,96800.0,296.95,2001-12-31 23:47:00,40064970378,0-20000-0-94578,0
86065,86065,-27.43,153.08,100000.0,300.45,2001-12-31 23:47:00,40064970378,0-20000-0-94578,0
86066,86066,-27.43,153.08,100300.0,300.95,2001-12-31 23:47:00,40064970378,0-20000-0-94578,0


In [100]:
comp.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 86068 entries, 0 to 86067
Data columns (total 9 columns):
 #   Column            Non-Null Count  Dtype  
---  ------            --------------  -----  
 0   obs_id            86068 non-null  int64  
 1   lat               86068 non-null  float64
 2   lon               86068 non-null  float64
 3   plev              86068 non-null  float64
 4   ta                86068 non-null  float64
 5   time              86068 non-null  object 
 6   trajectory_label  86068 non-null  int64  
 7   statid            86068 non-null  object 
 8   statindex         86068 non-null  int64  
dtypes: float64(4), int64(3), object(2)
memory usage: 5.9+ MB


### Points in Time:

In [102]:
print('Ascents: ' + str(365 * 2))
print('datetimes requested: ' + str(len(comp.time.drop_duplicates())))

comp_cleaned = comp
comp_cleaned.time = pandas.to_datetime(comp_cleaned.time, utc=True)
comp_cleaned = comp_cleaned[comp_cleaned.time.dt.hour == 12].append(comp_cleaned[comp_cleaned.time.dt.hour == 0])
comp_cleaned = comp_cleaned[comp_cleaned.time.dt.minute == 0]
comp_cleaned = comp_cleaned[comp_cleaned.time.dt.second == 0]
print('cleaned datetimes requested: ' + str(len(comp_cleaned.time.drop_duplicates())))

Ascents: 730
datetimes requested: 2002
cleaned datetimes requested: 655


In [99]:
comp.ta.isnull().sum()

0

### Missing data:

In [104]:
print('Missing data: '+ str(100./len(comp)*comp.ta.isnull().sum()) + '%')

Missing data: 0.0%
