# Example usage

To use `anespy` in a project:

In [1]:
import anespy.anespy as anes

print(anespy.__version__)

NameError: name 'anespy' is not defined

## Loading ANES Data

One of the primary challenges of working with ANES data is that it's all over the place, and there is no *true* API for repeatedly accessing the data. Getting data files requires clicking a button for the format you'd like, which means that there are no user-facing static links for getting data. However, there is a somewhat hidden internal API that the site makes requests to when you select the file you wish to download. This package leverages this request system to acquire the datasets. 

The function ````load_ANES_data(year, add_names = False)```` takes two arguments:
1. ```year```: year of the data you would like to access
2. ```add_names```: if you want to swap the variable names for their more complete, context-inclusive names (defaults to ```False```)

For example, say you wanted to pull the 2016 version of the main ANES Time Series:

In [None]:
data = anes.load_ANES_data(2016)
data.head()

This package (at present) provides support for only the main Time Series supplements for the ANES going back to 2000. To check what versions are available, you can use the ```editions()``` method.

In [None]:
anes.editions()

But there is something essential to note here about the data we just loaded. Though it is a DataFrame, it also *isn't* a DataFrame. 

In [None]:
type(data)

As part of the acquisition, the data are instantiated as an ```ANES``` object. 

## The ```ANES``` Class

Part of this package is the ```ANES``` class, which is a child of ```Pandas.DataFrame```. Much of the functionality of ```Pandas``` applies to work with ANES data, but there are consistent properties across ANES Time Series instances that make them worthy of additional methods. 

When you load ANES data, it is instantiated as an ```ANES``` object with a ```year``` property. For example, the data we just loaded:


In [None]:
data.year

This property is very useful for class methods, or if you're transforming and combining datasets from multiple years.  

## Adding Years to the Data

An advantage of the ```ANES``` class is that certain transformations and functions can access the year of the Time Series automatically. For example, a common problem with ANES data is that they do not include a **Year** column by default outside of the long **version** name. One of the built-ins with this package is the class method ```add_year(self)```, which appends a **Year** column to the beginning of the ```ANES``` object. 

In [None]:
data.add_year()
data.head()

This can be especially useful for data intended to be exported, joining variables across time series, or merging ANES samples with other datasets from the same year. 

## Converting Variable Names

Something you might have noticed about the example data is that the variable names are non-identifying. Typically, work with ANES data requires referencing a codebook to understand what the variables you're working with are. This is only the beginning of the issues with ANES variable names, but included in this package is ```convert_var_names(self, drop_extra = True)```, which recodes the variable names as their full title from the codebook. 


In [None]:
data.convert_var_names()
data.head()

But if you change your mind at any point, this transformation can be undone:

In [None]:
data.convert_var_names()
data.head()

Something to note is that you may lose some variables because of a mismatch between the codebook's listed variables and what is actually provided in the data. If you would like to retain the extra data and manually search these mystery variables, you have the option to set ```drop_extra``` to false during the initial conversion.

In [None]:
data = anes.load_ANES_data(2016)
data.convert_var_names(drop_extra = False)
data.head()

## Recoding To Categories

Another disadvantage of ANES data 'as-is' is that the data are that most of the variables are factors, but are inconsistently coded. This issue is partially resolved by the ```load_ANES_data``` function, yet there remains the issue of ambiguous values for categoricals. Packaged with ```ANESPy``` is ```recode_to_char```, which replaces the values for a selected column with their full character labels from the codebook.

For example, the 2012 edition includes some ideology variables, which when left as numbers, are not entirely useful. 

In [None]:
data = anes.load_ANES_data(2008)
data['V083099a']

After recoding:

In [None]:
data.recode_to_char('V083099a')

Now we have a complete understanding of what these variables represent.

#### A Note About Variable Names

At present, this function is designed to work only with the variable names in their original "V_____" format. Because of the duplicated pre/post variable issue, some variables will return a `KeyError` after being converted to their full-context name.

## Split Pre & Post Variables

Another somewhat unbelievable issue with ANES datasets is that some years have duplicated variable codes. The first appearance represents the *pre-election* sample, while the second represents the *post-election* sample. As part of the ```convert_var_names``` functionality, specific variables are given "Pre" and "Post" tags. These can then be leveraged to split the variables into Pre and Post groups, which can be very useful for later analysis.

In [16]:
data = anes.load_ANES_data(2012)
data.convert_var_names()
data_pre, data_post = data.split_pre_post()

  


Converted to numbered variables.


In [17]:
data_pre

0
1
2
3
4
...
5909
5910
5911
5912
5913


## Generate a Sample

Lastly, this package allows you to draw a sample from the object along a set of specific variables. This can be useful for designs involving re-sampling, exploratory statistical testing, or other functions where the entire set of respondents is not needed. 

The ```generate_sample``` function takes two arguments: `variables` (a list of variable names) and `n_respondents`, which is size of the sample you want to extract. 

In [21]:
data = anes.load_ANES_data(2004)
sample = data.generate_sample(list(data.columns.values[0:7]), n_respondents = 10)
sample

  


Unnamed: 0,Version,Dsetid,V040001,V040002,V040101,V040102,V040103
139,2004NES_VERSION:2005AUG16,2004.T,140,693,0.9129,0.9251,202
395,2004NES_VERSION:2005AUG16,2004.T,396,576,1.2559,1.2903,242
223,2004NES_VERSION:2005AUG16,2004.T,224,613,0.9218,0.8961,262
1197,2004NES_VERSION:2005AUG16,2004.T,1199,588,0.9994,1.0542,232
491,2004NES_VERSION:2005AUG16,2004.T,492,407,0.5177,0.5571,181
737,2004NES_VERSION:2005AUG16,2004.T,738,757,1.7232,1.7703,141
369,2004NES_VERSION:2005AUG16,2004.T,370,335,1.4699,1.5101,212
839,2004NES_VERSION:2005AUG16,2004.T,840,951,1.4943,1.6498,161
216,2004NES_VERSION:2005AUG16,2004.T,217,574,1.1943,1.0514,262
1115,2004NES_VERSION:2005AUG16,2004.T,1117,644,0.8616,0.8852,142
