<!--BOOK_INFORMATION-->
<img align="left" style="padding-right:10px;" src="figures/k2_pix_small.png">
*This notebook contains an excerpt instructional material from [gully](https://twitter.com/gully_) and the [K2 Guest Observer Office](https://keplerscience.arc.nasa.gov/); the content is available [on GitHub](https://github.com/gully/goldenrod).*


<!--NAVIGATION-->
< [KEGS metadata and sample overview](01.01-KEGS-sample-overview.ipynb) | [Contents](Index.ipynb) | [Fetch all the KEGS data with `wget`](01.03-wget-all-KEGS-data.ipynb) >

# Custom tile target apertures on galaxies


The Kepler spacecraft is over 1 AU away from the Earth, and relies on low-bandwidth communication to telemeter data back to Earth.  Accordingly, only a $\sim5$% portion of the pixels can be telemetered, so we only download small postage stamp windows rather than the full CCD mosaic.  The sizes of these *target apertures* are assigned programmatically based on software designed for the original Kepler prime mission and now retrofitted for K2.  The scientifically needed target aperture sizes sometimes exceed the programmatically assigned target aperture size.  In these cases, manual overrides called "tiles" are placed on the desired target.  These tiles sometimes violate the simplistic assumptions of the Kepler pipeline, complicating the extraction of lightcurves.  See the [K2 Handbook](http://archive.stsci.edu/k2/manuals/k2_handbook.pdf) Section 2.2 for more details on K2 target management.

Because of their added complexity, it is worth looking into how many of the KEGS targets were assigned custom apertures, these so-called "tiles".  

In [1]:
import pandas as pd
import numpy as np

Change the path below to wherever you have the [K2-metadata repository](https://github.com/gully/k2-metadata)

In [2]:
K2_GO_metadata_path = '../../K2-metadata/metadata/tidy/GO_proposal_metadata.csv'

In [3]:
df_GO = pd.read_csv(K2_GO_metadata_path, dtype={'campaign':str}, usecols=['EPIC ID','Investigation IDs', 'campaign'])
df_GO.shape

(426130, 3)

We'll find all the KEGS targets in the same way as we did in a previous notebook.

In [4]:
KEGS_ids = ['GO14078','GO14079','GO12068','GO12116','GO10053','GO10070',
            'GO8053','GO8070','GO6077','GO5096','GO4096','GO3048','GO1074','GO0103']

search_pattern = '|'.join(KEGS_ids)
df_GO['KEGS_target'] = df_GO['Investigation IDs'].str.contains(search_pattern)
df_GO = df_GO[df_GO['KEGS_target']]
df_GO.head()

Unnamed: 0,EPIC ID,Investigation IDs,campaign,KEGS_target
235,202059522,GO0009_LC|GO0035_LC|GO0061_LC|GO0074_LC|GO010...,0,True
252,202060054,GO0013_LC|GO0063_LC|GO0075_LC|GO0077_LC|GO010...,0,True
508,202062048,GO0035_LC|GO0103_LC|GO0106_LC,0,True
2667,202074356,GO0103_LC|GO0106_LC,0,True
2668,202074357,GO0103_LC|GO0106_LC,0,True


In [5]:
df_GO['Is_TILE'] = df_GO['Investigation IDs'].str.contains('TILE|SQUARE_GALAXY')

In [6]:
df_GO.head()

Unnamed: 0,EPIC ID,Investigation IDs,campaign,KEGS_target,Is_TILE
235,202059522,GO0009_LC|GO0035_LC|GO0061_LC|GO0074_LC|GO010...,0,True,False
252,202060054,GO0013_LC|GO0063_LC|GO0075_LC|GO0077_LC|GO010...,0,True,False
508,202062048,GO0035_LC|GO0103_LC|GO0106_LC,0,True,False
2667,202074356,GO0103_LC|GO0106_LC,0,True,False
2668,202074357,GO0103_LC|GO0106_LC,0,True,False


In [7]:
df_GO.Is_TILE.value_counts()

False    40019
True       108
Name: Is_TILE, dtype: int64

There are 108 instances in which a KEGS galaxy target is on a tile.  The vast majority of KEGS targets are on "regular", programmatically-assigned apertures.  
Let's take a look at those 108.

In [8]:
df_GO[df_GO.Is_TILE].tail()

Unnamed: 0,EPIC ID,Investigation IDs,campaign,KEGS_target,Is_TILE
361150,200183082,GO14078|NGC3412|GALAXY_TILE,14,True,True
361151,200183083,GO14078|NGC3412|GALAXY_TILE,14,True,True
361152,200183084,GO14078|NGC3412|GALAXY_TILE,14,True,True
361153,200183085,GO14078|NGC3412|GALAXY_TILE,14,True,True
361154,200183086,GO14078|NGC3412|GALAXY_TILE,14,True,True


Indeed, it looks like galaxies with large solid-angles on the sky, like NGC3412, were assigned custom masks.  Let's find the unique entries only.  
We will have to perform some slightly advanced pandas methods: data cleaning, string manipulation, aggregation and filtering.

In [9]:
df_GO['Investigation IDs'] = df_GO['Investigation IDs'].str.strip(' ')

In [10]:
df_GO[df_GO.Is_TILE]['Investigation IDs'].str.split('|').tail()

361150    [GO14078, NGC3412, GALAXY_TILE]
361151    [GO14078, NGC3412, GALAXY_TILE]
361152    [GO14078, NGC3412, GALAXY_TILE]
361153    [GO14078, NGC3412, GALAXY_TILE]
361154    [GO14078, NGC3412, GALAXY_TILE]
Name: Investigation IDs, dtype: object

In [11]:
df_GO['Investigator_list'] = df_GO['Investigation IDs'].str.split('|')

In [12]:
tile_targets = df_GO.Investigator_list[df_GO.Is_TILE].reset_index(drop=True)

In [13]:
tile_targets.tail()

103    [GO14078, NGC3412, GALAXY_TILE]
104    [GO14078, NGC3412, GALAXY_TILE]
105    [GO14078, NGC3412, GALAXY_TILE]
106    [GO14078, NGC3412, GALAXY_TILE]
107    [GO14078, NGC3412, GALAXY_TILE]
Name: Investigator_list, dtype: object

In [14]:
KEGS_targs_on_tiles = tile_targets.aggregate(np.concatenate)

In [15]:
np.unique(KEGS_targs_on_tiles)

array(['GALAXY_TILE', 'GO14078', 'M105', 'M95', 'M96', 'NGC3384',
       'NGC3412', 'NGC3423', 'SQUARE_GALAXY'],
      dtype='<U13')

In [16]:
set(KEGS_targs_on_tiles) - set(KEGS_ids)

{'GALAXY_TILE',
 'M105',
 'M95',
 'M96',
 'NGC3384',
 'NGC3412',
 'NGC3423',
 'SQUARE_GALAXY'}

OK so there were six named galaxies observed by KEGS that were so large on the sky as to require custom `TILE` apertures, and another `SQUARE_GALAXY` category presumably for a cluster of a few galaxies.
These targets all originate from the [GO14078 proposal](https://keplerscience.arc.nasa.gov/data/k2-programs/GO14078.txt).
As noted in the previous notebook, these tiles can cause over-counting of targets.  For example if you simply counted all the unique EPIC IDs associated with a proposal you would get a higher number than the number of unique targets.

In [17]:
in_GO14078 = df_GO['Investigation IDs'].str.contains('GO14078')
in_GO14078.sum()

108

Match these sources with the K2 target index.

In [18]:
targ_index_path = '../../k2-target-index/k2-target-pixel-files.csv.gz'

In [19]:
%time df_targ = pd.read_csv(targ_index_path)

CPU times: user 13 s, sys: 1.1 s, total: 14.1 s
Wall time: 13.5 s


In [20]:
df_comb = pd.merge(df_GO[ in_GO14078 ], df_targ, how='left', left_on='EPIC ID', right_on='keplerid')

In [21]:
df_comb[df_comb.columns[0:10]].head()

Unnamed: 0,EPIC ID,Investigation IDs,campaign_x,KEGS_target,Is_TILE,Investigator_list,filename,url,filesize,object
0,200182979,GO14078|SQUARE_GALAXY,14,True,True,"[GO14078, SQUARE_GALAXY]",,,,
1,200182980,GO14078|SQUARE_GALAXY,14,True,True,"[GO14078, SQUARE_GALAXY]",,,,
2,200182981,GO14078|SQUARE_GALAXY,14,True,True,"[GO14078, SQUARE_GALAXY]",,,,
3,200182982,GO14078|SQUARE_GALAXY,14,True,True,"[GO14078, SQUARE_GALAXY]",,,,
4,200182983,GO14078|SQUARE_GALAXY,14,True,True,"[GO14078, SQUARE_GALAXY]",,,,


In [22]:
df_comb.filename.nunique()

0

Campaign 14 is not yet in the K2-target index!

<!--NAVIGATION-->
< [KEGS metadata and sample overview](01.01-KEGS-sample-overview.ipynb) | [Contents](Index.ipynb) | [Fetch all the KEGS data with `wget`](01.03-wget-all-KEGS-data.ipynb) >