# Week 2- Data Structures and Dataframe Operations

**Objectives**: Today we are going to explore both Python-specific and general data structures that are important to data science. We will cover the following:
  
* Variables and datatypes
* Strings, lists, and tuples
* Arrays, dataframes and series
* Basic dataframe operations
* Dictionaries and JSON

## Variables and Basic Datatypes

Variables in Python are names for values that are stored in memory. Typically, variables in Python are lower case letters that describe the value of the variable like:

<code>counter = 2</code>

Unlike other languages, Python uses dynamic typing, so datatypes are inferred from the value of the variable. In the "<code>counter</code>" example above, Python assumes that the "2" is an integer or type of <code>int</code>. 

Knowing that <code>counter</code> is an integer is critical if you plan to do any calulations with it. For example, if wanted to divide the value in <code>counter</code> by 2 expecting the answer to be .5, you would be mistaken. Try this in the code block below:

In [None]:
# Assign 2 to a variable named "counter"

# Divide "counter" by two


**Some Basic Built-in Datatypes**

Python 2.7.x has the following built-in numeric datatypes:

* <code>int</code>: plain integers like 4
* <code>float</code>: floating point numbers like 4.0
* <code>complex</code>: complex numbers like 4j

To determine the datatype of any object, you can use the <code>type</code> function like:

<code>type(object)</code>

Many analytic tasks use numeric types, so it is good to be familar with them. Beyond numeric values, many analytic tasks will also include other built-in types like:

Boolean types:
* <code>bool</code>: boolean like True or False

Sequence types:
* <code>str</code>: string like this sequence of characters "this is a string"
* <code>list</code>: list like this list of ints: [1,2,3]
* <code>tuple</code>: tuple like this tuple of ints: (1,2,3)

You can learn more about the details of types here- https://docs.python.org/2/library/stdtypes.html

We are not going to go into the significantly more complex inner-workings of these types, but understanding the basics of accessing data is important to a data scientist and there are some common approaches regardless of type.

For a sequence type, there are two ways to get parts of those sequences- indexing and slicing:

**Indexing**

Python uses a zero-based index which means that the first element will be zero.  To access a specific value, you append the index of that value to the sequence like:

<code>my_sequence[0] # The zero in square brackets returns the first or 0th element </code>

**Slicing**

Slices work by extending the idea of an index with an operator that allows you to select a range of index locations in a sequence type. Specifically, you use <code>:</code> and index locations to retrive parts of the sequence like:

````
my_sequence[start:end] # items from start through end-1
my_sequence[start:]    # items from start through the rest of the array
my_sequence[:end]      # items from the beginning through end-1
my_sequence[:]         # the whole array
````

Enter some different sequence types below and index and slice them to return their elements.

In [None]:
# Assign a string type sequence with five elements to a variable name

# Print the variable

# Return the 4th element

# Return the 2nd element to the end of the sequence

# Assign a list type sequence with five elements to a variable name

# Print the list

# Return the 2th element

# Return the 1st element to the end of the sequence

# Assign a tuple type sequence with five elements that are bools to a variable name

# Print the list

# Return the 2th element

# Return the 1st element to the end of the sequence

## Arrays and Dataframes

The basic variables and sequence types are quite useful for a variety of tasks, but arrays and dataframes are the workhorses of analytics and data science. Arrays and dataframes are great not only because they can represent multidimensional data like a spreadsheet, but also because they use vectorized code. In Python this means both NumPy arrays and pandas dataframes use optimized C code that doesn't need to use Python for loops and can therefore be much more concise and much faster.

