# Slicing Data in Python, An In-depth Overview

#### The many ways to slice data in Python.

### Abstract:

While performing exploratory data analysis, one is often moving quickly between different data structures and then this question arises, "How do I extract or filter the data I need?" This overview of Python data structures attempts to unveil the various methods in concise junks, so that they can be compared, contrasted, and committed to memory! At the end of the talk, the hope is that slicing and filtering data in Python will be as intuitive as slicing bread.

### Some Resources to Share:
- Are you just getting started with Python?
    - [Google Python Edu Course](https://developers.google.com/edu/python/)
- [DataCamp String Tutorial](https://www.datacamp.com/community/tutorials/python-string-tutorial)
- [DataCamp Dictionary Tutorial](https://www.datacamp.com/community/tutorials/python-dictionary-tutorial)
- [DataCamp NumPy Array Tutorial](https://www.datacamp.com/community/tutorials/python-numpy-tutorial)
- [Python Data Structures](https://docs.python.org/3.7/tutorial/datastructures.html)

----

## Lists

----

<div style="background-color: #000054; padding: 25px;">
<h4 style="color: #ffffff;">Generating a list of random numbers as strings</h4>
<p style="color: #cccccc;">In the following example, the <a src="https://docs.scipy.org/doc/numpy/reference/generated/numpy.random.randn.html#numpy.random.randn">randn</a> method of the NumPy random module is used to generate 14 random values from the "standard normal" distribution. Then string formatting is used to limit the accuracy of each value to 4 decimal places. The resulting Python object is saved to a variable called <em>my_list</em>.</p>
</div>

In [1]:
import numpy as np

my_list = [f'{x:.4f}' for x in np.random.randn(14)]

In [2]:
print(my_list)

['-0.4634', '0.5072', '0.1838', '-1.4578', '-0.5278', '0.2062', '1.4527', '0.9543', '0.4467', '-0.4148', '0.1389', '1.0753', '1.1266', '-0.4051']


<div style="background-color: #000054; padding: 25px;">
<h4 style="color: #ffffff;">Slicing single values from a list</h4>
<p style="color: #cccccc;">To pull out items from a list, square bracket notation is used to index the list. In the following example, first the fifth item is printed and then the fourth last item. Python indexes are zero indexed.</p>
</div>

In [3]:
print("fifth list element:", my_list[4], "\nfourth last list element:", my_list[-4])

fifth list element: -0.5278 
fourth last list element: 0.1389


<div style="background-color: #000054; padding: 25px;">
<h4 style="color: #ffffff;">Slicing an index range from a list</h4>
<p style="color: #cccccc;">To get any subset of sequential items from a list, square bracket notation is used again with a colon separating the starting index value and the <strong>exclusive</strong> ending index value.</p>
</div>

In [4]:
print(my_list[3:7])

['-1.4578', '-0.5278', '0.2062', '1.4527']


<div style="background-color: #000054; padding: 25px;">
<h4 style="color: #ffffff;">Non-sequential index range from a list</h4>
<p style="color: #cccccc;">To get a non-sequential subset of items from a list, square bracket notation is used again with a colon separating the starting index value and the <strong>exclusive</strong> ending index value and then another colon separating the interval of items to return. For example, passing 2 as the interval will return the first item and skip the second. You can then imagine the same pattern starting at the third item with a return-skip non-sequential sequence. Passing a 4 for the slicing interval will produce a return-skip-skip-skip sequence.</p>
</div>

In [5]:
print(my_list[::2])

['-0.4634', '0.1838', '-0.5278', '1.4527', '0.4467', '0.1389', '1.1266']


<div style="background-color: #000054; padding: 25px;">
<h4 style="color: #ffffff;">Generating a 2-D list of lists</h4>
<p style="color: #cccccc;">The following code generates a list of lists with 5 random numbers in each of the 5 lists. The list of lists, <em>l_of_l</em> is then printed.</p>
</div>

In [6]:
list_of_lists = [x for x in np.random.randn(5, 5)]

l_of_l = []
for x in list_of_lists:
    l_of_l.append([round(i, 2) for i in x])

In [7]:
# A 2-D list
l_of_l

[[0.38, -0.75, -0.58, -0.41, -1.53],
 [-0.46, 0.65, 2.02, 0.21, 0.87],
 [-1.44, 0.8, 0.41, -0.2, -0.2],
 [0.43, -0.33, 0.69, -0.27, 0.71],
 [0.84, 1.06, -0.66, 1.61, 0.92]]

<div style="background-color: #000054; padding: 25px;">
<h4 style="color: #ffffff;">Slicing data from a list of lists</h4>
<p style="color: #cccccc;">In the following example we slice the fourth random number from the fourth list in the array. We achieve this by first using square bracket notation to indicate the object's (row), second we follow that with the same square bracket notation to indicate the item in that row we want to slice. Remember this notation because it contrasts with the less verbose and more efficient NumPy syntax for 2 dimensional arrays.</p>
</div>

In [8]:
print(l_of_l[3][3])

-0.27


<div style="background-color: #000054; padding: 25px;">
<h4 style="color: #ffffff;">Performing element-wise operations in list arrays</h4>
<p style="color: #cccccc;">If we want the fourth element from the first 3 rows, we don't have a concise syntax within the above mentioned extended indexing notation. To perform this task we need a for loop and if we were operating on these elements we would require a nested for loop.</p>
</div>

In [9]:
for l in l_of_l[:3]:
    print(l[3])

-0.41
0.21
-0.2


<div style="background-color: #000054; padding: 25px;">
<h4 style="color: #ffffff;">Pythonic Zen</h4>
<p style="color: #cccccc;">A more Pythonic and efficient approach would utilize a list comprehension for the above task.</p>
</div>

In [10]:
[l[3] for l in l_of_l[:3]]

[-0.41, 0.21, -0.2]

<div style="background-color: #000054; padding: 25px;">
<h4 style="color: #ffffff;">Some background information</h4>
<p style="color: #cccccc;">It is worth mentioning some background information that should augment your understanding of slicing operations. The <a src="https://docs.python.org/3.7/library/functions.html?highlight=slice#slice"><em>slice</em></a> built-in class is the underlying workhorse of extended indexing syntax and it is utilized throughout the numerical python package (NumPy). See the link to read more about this built-in Python class. The following example shows the same functionality as the beginning of this document.</p>
</div>

In [11]:
my_slice = slice(1, 8, 2)
my_list[my_slice]

['0.5072', '-1.4578', '0.2062', '0.9543']

<div style="background-color: #000054; padding: 25px;">
<h4 style="color: #ffffff;">An alternate slice object that iterates</h4>
<p style="color: #cccccc;">If you want the output of the <em>slice</em> class to be iterable, you will have to look elsewhere. This is where our first built-in Python package comes to the rescue. Itertools has a function called <a src="https://docs.python.org/3/library/itertools.html#itertools.islice"><em>islice</em></a> or to help you conceptulize, "iterable-slice." The below code tries to illustrate that the object returned by slice() is not iterable and islice() is.</p>
</div>

In [12]:
try:
    iter(my_slice)
except:
    print("my_slice is not an iterator object")

my_slice is not an iterator object


In [13]:
from itertools import islice

In [14]:
iter_slice = islice(my_list, 1, 8, 2)

[float(i) for i in iter_slice]

[0.5072, -1.4578, 0.2062, 0.9543]

In [15]:
iter(iter_slice)

<itertools.islice at 0x113c52ae8>

----

## Strings

----

<div style="background-color: #000054; padding: 25px;">
<h4 style="color: #ffffff;">Strings, same as 1-D lists, no sweat</h4>
<p style="color: #cccccc;">Strings allow for the same extended indexing for slices that we saw with a single list. As long as you remember zero indexing, exclusive stopping index, and interval sequencing, you are ready for string slicing.</p>
</div>

In [16]:
my_string = "My name is Cody"

In [17]:
my_string[-4:]

'Cody'

In [18]:
my_string[3:7]

'name'

In [19]:
my_string[::2]

'M aei oy'

<div style="background-color: #000054; padding: 25px;">
<h4 style="color: #ffffff;">On the edge of the rabbit hole</h4>
<p style="color: #cccccc;">To what extent do you want to discuss slicing strings. The following resources illustrate the depth of this topic and its extension to analyzing corpora of textual data.</p>
</div>

- [The built-in `re` module](https://docs.python.org/3/library/re.html)
- [The third party `regex` module](https://pypi.org/project/regex/)
- [The freely available 'Analyzing Text with the Natural Language Toolkit'](https://www.nltk.org/book/)

----

## NumPy Arrays

----

<div style="background-color: #000054; padding: 25px;">
<h4 style="color: #ffffff;">Numerical Python and the NumPy array</h4>
<p style="color: #cccccc;">Oh NumPy! You have cleaner array print() outputs, less verbose syntax, vectorized operations, and lay the foundation for pandas and many other packages. The following code generates a Numpy array, <em>num_2d_array</em>, from our list of lists.</p>
</div>

In [20]:
import numpy as np

num_2d_array = np.array(l_of_l)
num_2d_array

array([[ 0.38, -0.75, -0.58, -0.41, -1.53],
       [-0.46,  0.65,  2.02,  0.21,  0.87],
       [-1.44,  0.8 ,  0.41, -0.2 , -0.2 ],
       [ 0.43, -0.33,  0.69, -0.27,  0.71],
       [ 0.84,  1.06, -0.66,  1.61,  0.92]])

<div style="background-color: #000054; padding: 25px;">
<h4 style="color: #ffffff;">Simple 1 dimensional NumPy array slicing</h4>
<p style="color: #cccccc;">No changes here from the list syntax.</p>
</div>

In [21]:
num_2d_array[0:3:2]

array([[ 0.38, -0.75, -0.58, -0.41, -1.53],
       [-1.44,  0.8 ,  0.41, -0.2 , -0.2 ]])

<div style="background-color: #000054; padding: 25px;">
<h4 style="color: #ffffff;">2 dimensional NumPy array slicing</h4>
<p style="color: #cccccc;">No more for loops for slicing items from rows within the array. We now have a new syntax that can be thought of as <em>[rows, columns]</em> and each of the slices support the extended indexing syntax of <em>start:stop:step</em>.</p>
</div>

In [22]:
num_2d_array[0:3, ::2]

array([[ 0.38, -0.58, -1.53],
       [-0.46,  2.02,  0.87],
       [-1.44,  0.41, -0.2 ]])

<div style="background-color: #000054; padding: 25px;">
<h4 style="color: #ffffff;">Easily slice a column</h4>
<p style="color: #cccccc;">The new 2-D NumPy array syntax makes column slices efficient and less verbose than list arrays. The following example would return everything from the first column.</p>
</div>

In [23]:
num_2d_array[:,0]

array([ 0.38, -0.46, -1.44,  0.43,  0.84])

<div style="background-color: #000054; padding: 25px;">
<h4 style="color: #ffffff;">Another column slice example</h4>
<p style="color: #cccccc;">This example would return, for the first three rows, the third column item. </p>
</div>

In [24]:
num_2d_array[:3,3]

array([-0.41,  0.21, -0.2 ])

<div style="background-color: #000054; padding: 25px;">
<h4 style="color: #ffffff;">Boolean indexing, supported by a NumPy array near you!</h4>
<p style="color: #cccccc;">By using a threshold, equality, or other comparison operator, you can easily create a boolean mask of a given array. Then you can use this boolean mask within the square bracket notation to return the items where the index matches <em>True</em>. In the following example, the code would return every value in the <em>num_2d_array</em> that is greater than zero.</p>
</div>

In [25]:
boolean_index = num_2d_array > 0
boolean_index

array([[ True, False, False, False, False],
       [False,  True,  True,  True,  True],
       [False,  True,  True, False, False],
       [ True, False,  True, False,  True],
       [ True,  True, False,  True,  True]])

In [26]:
# NumPy also supports boolean  indexing
num_2d_array[boolean_index]

array([0.38, 0.65, 2.02, 0.21, 0.87, 0.8 , 0.41, 0.43, 0.69, 0.71, 0.84,
       1.06, 1.61, 0.92])

<div style="background-color: #000054; padding: 25px;">
<h4 style="color: #ffffff;">What is the Ellipsis object?</h4>
<p style="color: #cccccc;">When you have a 1-D or 2-D array, the utility of the Ellipsis object is hard to grasp. However, if your ndarray, <em>x</em>, is representative of <em>x.ndim >= 3</em> then we can talk about the Ellipsis object for slicing NumPy arrays. This is because the Ellipsis object will span any number of dimensions infered in your slice notation. The following example returns, for all items found in the second item of the first dimension, the second items of the fouth dimension.</p>
</div>

In [27]:
multi_dim_array = np.arange(16).reshape(2,2,2,2)

In [28]:
multi_dim_array

array([[[[ 0,  1],
         [ 2,  3]],

        [[ 4,  5],
         [ 6,  7]]],


       [[[ 8,  9],
         [10, 11]],

        [[12, 13],
         [14, 15]]]])

In [29]:
multi_dim_array[1,...,1]

array([[ 9, 11],
       [13, 15]])

<div style="background-color: #000054; padding: 25px;">
<h4 style="color: #ffffff;">Simple slicing with indices</h4>
<p style="color: #cccccc;">The following example would return items in the sequence of indices given. If you read the code logically it says, return the first item, then the second, the first again, the third, the first again, and then the fourth in a new NumPy array.</p>
</div>

In [30]:
num_2d_array[[0,1,0,2,0,3]]

array([[ 0.38, -0.75, -0.58, -0.41, -1.53],
       [-0.46,  0.65,  2.02,  0.21,  0.87],
       [ 0.38, -0.75, -0.58, -0.41, -1.53],
       [-1.44,  0.8 ,  0.41, -0.2 , -0.2 ],
       [ 0.38, -0.75, -0.58, -0.41, -1.53],
       [ 0.43, -0.33,  0.69, -0.27,  0.71]])

<div style="background-color: #000054; padding: 25px;">
<h4 style="color: #ffffff;">Advanced slicing notation with indices</h4>
<p style="color: #cccccc;">The following example would return items in the sequence of indices given <strong>but</strong> subsequent argument(s) will broadcast to the preceding argument(s). If you read the code logically it says, return for the second, third, and fourth items, the second, third, and fourth items, <strong>respectively</strong>, to a new NumPy array.</p>
</div>

In [31]:
num_2d_array[(1,2,3),(1,2,3)]

array([ 0.65,  0.41, -0.27])

Remember that NumPy arrays are homogenious with regards to datatype and that simple 1-D slicing in NumPy arrays is exactly the same as slicing 1-D lists.

----

## Dictionaries

----

<div style="background-color: #000054; padding: 25px;">
<h4 style="color: #ffffff;">Creating a dictionary</h4>
<p style="color: #cccccc;">The following code creates two lists, merges them together with the zip function, and then adds the key, value pairs to an empty dictionary. This new dictionary, <em>mapping</em>, is then printed to screen.</p>
</div>

In [32]:
key_list = ['a', 'b', 'c', 'd', 'e', 'f', 'g']
value_list = [[1, 2, 3, 4, 5], 'My name is Cody', 77, 0.1234, (123, 456), {"a dict": "within a dict"}, {1, 2, 3, 4}]

mapping = {}

for key, value in zip(key_list, value_list):
    mapping[key] = value

In [33]:
mapping

{'a': [1, 2, 3, 4, 5],
 'b': 'My name is Cody',
 'c': 77,
 'd': 0.1234,
 'e': (123, 456),
 'f': {'a dict': 'within a dict'},
 'g': {1, 2, 3, 4}}

<div style="background-color: #000054; padding: 25px;">
<h4 style="color: #ffffff;">A more Pythonic way</h4>
<p style="color: #cccccc;">To be more Pythonic with our code, we can use a dict comprehension to commit the key, value pairs of the zipped lists to a new dictionary, <em>comp_mapping</em></p>
</div>

In [34]:
comp_mapping = {key : value for key, value in zip(key_list, value_list)}

comp_mapping

{'a': [1, 2, 3, 4, 5],
 'b': 'My name is Cody',
 'c': 77,
 'd': 0.1234,
 'e': (123, 456),
 'f': {'a dict': 'within a dict'},
 'g': {1, 2, 3, 4}}

<div style="background-color: #000054; padding: 25px;">
<h4 style="color: #ffffff;">Slice a value from a dictionary</h4>
<p style="color: #cccccc;">To slice a value from a dictionary, you can use square bracket notation and provide the associated key. The following example shows how this works.</p>
</div>

In [35]:
mapping['c']

77

<div style="background-color: #000054; padding: 25px;">
<h4 style="color: #ffffff;">Multiple values returned from a dictionary</h4>
<p style="color: #cccccc;">In this example, we use a list of keys and iterate over it to return all the associated values. The resulting list from the list comprehension is then printed to screen.</p>
</div>

In [36]:
key_slice = ['a', 'c', 'g']

values_slice = [mapping[key] for key in key_slice]
values_slice

[[1, 2, 3, 4, 5], 77, {1, 2, 3, 4}]

<div style="background-color: #000054; padding: 25px;">
<h4 style="color: #ffffff;">Not quite a slice</h4>
<p style="color: #cccccc;">If we aren't looking to slice the dictionary but simply query it for a value, we can use the <em>in</em> operator to check membership within the dictionary. The result is a boolean. depending on what you are looking for (keys or values) you will need to use the values(), keys(), or items() methods.</p>
</div>

In [37]:
(123,456) in mapping.values()

True

<div style="background-color: #000054; padding: 25px;">
<h4 style="color: #ffffff;">An interesting use case</h4>
<p style="color: #cccccc;">What if I want to slice the keys that match the value 77. The dictionary data structure does not have a clear method for achieving this task. The next example sets up a function that will give us the first key that matches 77.</p>
</div>

In [38]:
def dict_search(dictionary, search_term):

    ''' This function takes two inputs. First is the
    dictionary you want to search, second is the value
    you want to search for. The function will return the
    FIRST key that matches that value.
    '''

    my_keys_indexed = list(dictionary.keys())
    my_values_indexed = list(dictionary.values())

    ans = my_keys_indexed[my_values_indexed.index(search_term)]
    
    return ans

In [39]:
dict_search(mapping, 77)

'c'

<div style="background-color: #000054; padding: 25px;">
<h4 style="color: #ffffff;">An extension to the idea</h4>
<p style="color: #cccccc;">This version of the function will return all keys that match the value 77.</p>
</div>

In [40]:
def dict_search_all(dictionary, search_term):
    
    '''This function loops over the dictionary and
    returns all keys that match the search term. The
    function takes two inputs:
    
    dictionary: a dict() object
    search_term: a value within the supplied dict()
    '''
    
    my_values_enumerated = enumerate(dictionary.values())
    
    idx_search_matches = []
    
    for idx, value in my_values_enumerated:
        if value == search_term:
            idx_search_matches.append(idx)
            
    key_matches = []
    
    for idx in idx_search_matches:
        key_matches.append(list(dictionary.keys())[idx])
        
    return key_matches

In [41]:
dict_search_all(mapping, 77)

['c']

In [42]:
for k, v in mapping.items():
    print(k, v)

a [1, 2, 3, 4, 5]
b My name is Cody
c 77
d 0.1234
e (123, 456)
f {'a dict': 'within a dict'}
g {1, 2, 3, 4}


<div style="background-color: #000054; padding: 25px;">
<h4 style="color: #ffffff;">However,</h4>
<p style="color: #cccccc;">We can achieve the same functionality with a for loop and a simple equality statement.</p>
</div>

In [43]:
search_term = 77

for k, v in mapping.items():
    if v == search_term:
        print(k, v)

c 77


----

## `collections`

----

## `Counter`

<div style="background-color: #000054; padding: 25px;">
<h4 style="color: #ffffff;">The built in collections package</h4>
<p style="color: #cccccc;">This package provides some useful and efficient methods that we can use on counter objects. What if you wanted to slice the three most common values from a dataset? The Counter object has just the right function, <em>most_common()</em>. First we import Counter from the collections package.</p>
</div>

In [44]:
from collections import Counter

<div style="background-color: #000054; padding: 25px;">
<h4 style="color: #ffffff;">Counting in dictionaries</h4>
<p style="color: #cccccc;">If the values of the dictionary are integers or floats, you can pass the key using square bracket notation to the Counter object to return the value assigned to that key. These examples mostly show the structure of a counter object. As the Counter moves over the iterable, it assigns the unique values to the keys and updates the value with the count. This is a common task within for loops that can be expedited with the Counter object.</p>
</div>

In [45]:
c = Counter({'red': 4, 'blue': 2})
c['red']

4

In [46]:
c2 = Counter({'red': 4.2, 'blue': 5.3})
c2['red']

4.2

<div style="background-color: #000054; padding: 25px;">
<h4 style="color: #ffffff;">Counting in strings</h4>
<p style="color: #cccccc;">If we create a Counter object from a string, we can use square bracket notation to query the number of times a letter occurs. Or, we can use the most_common method of the Counter object to slice the three most common characters.</p>
</div>

In [47]:
c = Counter('My name is Cody, and I love to program with Python')
c['y']

3

In [48]:
c.most_common(3)

[(' ', 10), ('o', 5), ('y', 3)]

## `deque`

<div style="background-color: #000054; padding: 25px;">
<h4 style="color: #ffffff;">Slicing the start or end of a queue</h4>
<p style="color: #cccccc;">It is worth the effort to work through the examples provided in the python documentation for <a src=https://docs.python.org/3.7/library/collections.html?highlight=collections#collections.deque>deque</a>. There are also excellent recipes for usage cases. The deque obeject comes with many useful and efficient methods, just like Counter. With deque, the methods are primarily concerned with appending and popping values from the beginning and end of an ordered sequence. The following code sets up a list of ordered float values. Then a deque object is generated from this list.</p>
</div>

In [49]:
my_queue = [x for x in np.arange(4, 25, 0.5)]

In [50]:
from collections import deque

d = deque(my_queue)

<div style="background-color: #000054; padding: 25px;">
<h4 style="color: #ffffff;">Slicing the end or start of a queue</h4>
<p style="color: #cccccc;">The next two examples will slice the end and the beginning from the deque object, respectively.</p>
</div>

In [51]:
d.pop()

24.5

In [52]:
d.popleft()

4.0

<div style="background-color: #000054; padding: 25px;">
<h4 style="color: #ffffff;">Adding to the end or start of a queue</h4>
<p style="color: #cccccc;">The next two examples will add values to the end and the beginning from the deque object, respectively.</p>
</div>

In [53]:
d.append(24.5)
d[-1]

24.5

In [54]:
d.appendleft(4.0)
d[0]

4.0

----

## Pandas

----

<div style="background-color: #000054; padding: 25px;">
<h4 style="color: #ffffff;">The pandas series</h4>
<p style="color: #cccccc;">First we will import pandas as pd and create a pandas series. To make things easy, we are creating the series from the my_queue list from before.</p>
</div>

In [55]:
import pandas as pd

In [56]:
my_series = pd.Series(my_queue)

<div style="background-color: #000054; padding: 25px;">
<h4 style="color: #ffffff;">Series slicing</h4>
<p style="color: #cccccc;">In this example we use the 1-D list or NumPy array syntax to slice the first five items of the series. Since pandas is built on top of Numerical Python, we might expect to see similarities. The output shows the index labels for these five items as well.</p>
</div>

In [57]:
my_series[0:5]

0    4.0
1    4.5
2    5.0
3    5.5
4    6.0
dtype: float64

<div style="background-color: #000054; padding: 25px;">
<h4 style="color: #ffffff;">Updating the index of our series</h4>
<p style="color: #cccccc;">We take a slice of the first 26 items, and then assign the letters of the english alphabet to be the index of our series.</p>
</div>

In [58]:
new_series = my_series[0:26]
len(new_series)

26

In [59]:
index_list = [x for x in "abcdefghijklmnopqrstuvwxyz"]
new_series.index = index_list

In [60]:
new_series[0:5]

a    4.0
b    4.5
c    5.0
d    5.5
e    6.0
dtype: float64

<div style="background-color: #000054; padding: 25px;">
<h4 style="color: #ffffff;">Slicing a series with the index label</h4>
<p style="color: #cccccc;">In this example we use the string index label to return the associated value. To do this we pass the string <em>'g'</em> using square bracket notation to the pandas series object.</p>
</div>

In [61]:
new_series['g']

7.0

### datetime indexing

<div style="background-color: #000054; padding: 25px;">
<h4 style="color: #ffffff;">Datetime indices</h4>
<p style="color: #cccccc;">There is a special data type for pandas datetime indices, which gives us new access to powerful and extensible syntax.</p>
</div>

In [62]:
from datetime import datetime
from datetime import timedelta

<div style="background-color: #000054; padding: 25px;">
<h4 style="color: #ffffff;">Creating a list of datetimes</h4>
<p style="color: #cccccc;">In this example we use a list comprehension to create a list of 26 dates that extend from the current date, backwards through the calendar.</p>
</div>

In [63]:
base = datetime.today()
date_index = [base - timedelta(days=x) for x in range(0, 26)]

<div style="background-color: #000054; padding: 25px;">
<h4 style="color: #ffffff;">Solving an intuition issue</h4>
<p style="color: #cccccc;">We would like to assign the index in assending date order. Therefore, we sort the datetime index before we assign it to the index of our series.</p>
</div>

In [64]:
new_series.index = sorted(date_index)

In [65]:
new_series[0:5]

2019-04-03 13:30:52.334366    4.0
2019-04-04 13:30:52.334366    4.5
2019-04-05 13:30:52.334366    5.0
2019-04-06 13:30:52.334366    5.5
2019-04-07 13:30:52.334366    6.0
dtype: float64

<div style="background-color: #000054; padding: 25px;">
<h4 style="color: #ffffff;">Splicing values using extensible date formats</h4>
<p style="color: #cccccc;">There are many different date formats that are accepted by pandas slicing notation. As long as the notation is not ambiguous, you should get the output that you would expect.</p>
</div>

In [66]:
new_series['April, 6 2019']

2019-04-06 13:30:52.334366    5.5
dtype: float64

In [67]:
new_series['2019-04-04':'2019-04-14']

2019-04-04 13:30:52.334366    4.5
2019-04-05 13:30:52.334366    5.0
2019-04-06 13:30:52.334366    5.5
2019-04-07 13:30:52.334366    6.0
2019-04-08 13:30:52.334366    6.5
2019-04-09 13:30:52.334366    7.0
2019-04-10 13:30:52.334366    7.5
2019-04-11 13:30:52.334366    8.0
2019-04-12 13:30:52.334366    8.5
2019-04-13 13:30:52.334366    9.0
2019-04-14 13:30:52.334366    9.5
dtype: float64

In [68]:
new_df = pd.DataFrame(new_series)

In [69]:
new_df['2019-04-02':'2019-04-04']

Unnamed: 0,0
2019-04-03 13:30:52.334366,4.0
2019-04-04 13:30:52.334366,4.5


<div style="background-color: #000054; padding: 25px;">
<h4 style="color: #ffffff;">Splicing values using day, month, or year</h4>
<p style="color: #cccccc;">Here we can see the unique components of our datetime index object. Using these we can slice all the values that occur on the second day of the month, the fourth month of the year, or a a specific year.</p>
</div>

- [More datetime64 component options](https://pandas.pydata.org/pandas-docs/stable/reference/indexing.html#time-date-components)

In [70]:
new_df[new_df.index.day == 7]

Unnamed: 0,0
2019-04-07 13:30:52.334366,6.0


In [71]:
new_df[new_df.index.month == 4]

Unnamed: 0,0
2019-04-03 13:30:52.334366,4.0
2019-04-04 13:30:52.334366,4.5
2019-04-05 13:30:52.334366,5.0
2019-04-06 13:30:52.334366,5.5
2019-04-07 13:30:52.334366,6.0
2019-04-08 13:30:52.334366,6.5
2019-04-09 13:30:52.334366,7.0
2019-04-10 13:30:52.334366,7.5
2019-04-11 13:30:52.334366,8.0
2019-04-12 13:30:52.334366,8.5


<div style="background-color: #000054; padding: 25px;">
<h4 style="color: #ffffff;">General DataFrame Methods</h4>
<p style="color: #cccccc;">head() will slice the first 5 rows of a DataFrame (tail() returns the last 5 rows), or you can pass the number of rows you would like to return. info() will return some statistics related to the columns including datatypes and the number of non-null values. describe() will return general statistics for each column of the DataFrame.</p>
</div>

In [72]:
another_df = pd.DataFrame({'a':np.arange(5, 10, 0.25), 'b':np.arange(10, 15, 0.25)})

In [73]:
another_df.head()

Unnamed: 0,a,b
0,5.0,10.0
1,5.25,10.25
2,5.5,10.5
3,5.75,10.75
4,6.0,11.0


In [74]:
another_df.tail()

Unnamed: 0,a,b
15,8.75,13.75
16,9.0,14.0
17,9.25,14.25
18,9.5,14.5
19,9.75,14.75


In [75]:
another_df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 20 entries, 0 to 19
Data columns (total 2 columns):
a    20 non-null float64
b    20 non-null float64
dtypes: float64(2)
memory usage: 400.0 bytes


In [76]:
another_df.describe()

Unnamed: 0,a,b
count,20.0,20.0
mean,7.375,12.375
std,1.47902,1.47902
min,5.0,10.0
25%,6.1875,11.1875
50%,7.375,12.375
75%,8.5625,13.5625
max,9.75,14.75


<div style="background-color: #000054; padding: 25px;">
<h4 style="color: #ffffff;">Creating a descriptive DataFrame</h4>
<p style="color: #cccccc;">The next few code blocks create a labeled DataFrame with descriptive row and column indexes.</p>
</div>

In [77]:
short_df = another_df[0:13]

In [78]:
rows_string = 'row '*13
rows_list = rows_string.split(sep=' ')
n = 1
for i in range(0,13):
    rows_list[i] = rows_list[i]+str(n)
    n+=1
    
rows_list.pop()
rows_list

['row1',
 'row2',
 'row3',
 'row4',
 'row5',
 'row6',
 'row7',
 'row8',
 'row9',
 'row10',
 'row11',
 'row12',
 'row13']

In [79]:
short_df.index = rows_list
short_df.columns = ['column1', 'column2']

<div style="background-color: #000054; padding: 25px;">
<h4 style="color: #ffffff;">Slicing a column from a DataFrame</h4>
<p style="color: #cccccc;"></p>
</div>

In [80]:
short_df['column1']

row1     5.00
row2     5.25
row3     5.50
row4     5.75
row5     6.00
row6     6.25
row7     6.50
row8     6.75
row9     7.00
row10    7.25
row11    7.50
row12    7.75
row13    8.00
Name: column1, dtype: float64

<div style="background-color: #000054; padding: 25px;">
<h4 style="color: #ffffff;">Slicing a column from a DataFrame, using .loc notation</h4>
<p style="color: #cccccc;"></p>
</div>

In [81]:
short_df.loc[:,'column2']

row1     10.00
row2     10.25
row3     10.50
row4     10.75
row5     11.00
row6     11.25
row7     11.50
row8     11.75
row9     12.00
row10    12.25
row11    12.50
row12    12.75
row13    13.00
Name: column2, dtype: float64

<div style="background-color: #000054; padding: 25px;">
<h4 style="color: #ffffff;">Slicing multiple columns from a DataFrame</h4>
<p style="color: #cccccc;"></p>
</div>

In [82]:
short_df[['column1', 'column2']]

Unnamed: 0,column1,column2
row1,5.0,10.0
row2,5.25,10.25
row3,5.5,10.5
row4,5.75,10.75
row5,6.0,11.0
row6,6.25,11.25
row7,6.5,11.5
row8,6.75,11.75
row9,7.0,12.0
row10,7.25,12.25


<div style="background-color: #000054; padding: 25px;">
<h4 style="color: #ffffff;">Slicing a subset of rows</h4>
<p style="color: #cccccc;"></p>
</div>

In [83]:
short_df[3:5]

Unnamed: 0,column1,column2
row4,5.75,10.75
row5,6.0,11.0


<div style="background-color: #000054; padding: 25px;">
    <h4 style="color: #ffffff;">Slicing a subset of rows, using .iloc notation</h4>
<p style="color: #cccccc;"></p>
</div>

In [84]:
short_df.iloc[3:5]

Unnamed: 0,column1,column2
row4,5.75,10.75
row5,6.0,11.0


<div style="background-color: #000054; padding: 25px;">
<h4 style="color: #ffffff;">Slicing a subset of rows from a column within a DataFrame</h4>
<p style="color: #cccccc;"></p>
</div>

In [85]:
short_df.loc[['row4', 'row7', 'row1'],'column2']

row4    10.75
row7    11.50
row1    10.00
Name: column2, dtype: float64

In [86]:
short_df[4:10]['column1']

row5     6.00
row6     6.25
row7     6.50
row8     6.75
row9     7.00
row10    7.25
Name: column1, dtype: float64

<div style="background-color: #000054; padding: 25px;">
<h4 style="color: #ffffff;">Wait, you can flip the syntax?</h4>
<p style="color: #cccccc;"></p>
</div>

In [87]:
short_df['column1'][4:10]

row5     6.00
row6     6.25
row7     6.50
row8     6.75
row9     7.00
row10    7.25
Name: column1, dtype: float64

In [88]:
short_df['column1']['row11']

7.5

<div style="background-color: #000054; padding: 25px;">
<h4 style="color: #ffffff;">But you cannot flip the syntax here</h4>
<p style="color: #cccccc;"></p>
</div>

In [89]:
try:
    short_df['row11']['column1']
except:
    print("You have to state the column index first")

You have to state the column index first


<div style="background-color: #000054; padding: 25px;">
<h4 style="color: #ffffff;">When dealing with lists</h4>
<p style="color: #cccccc;">You have to state the row index first which is the opposite to labelled index slicing with DataFrames.</p>
</div>

In [90]:
list_this = [[1, 2, 3, 4, 5], [6, 7, 8, 9, 10]]

list_this[1][1]

7

In [91]:
try:
    list_this[2][1]
except:
    print("You have to state the 'row' index first")

You have to state the 'row' index first


<div style="background-color: #000054; padding: 25px;">
<h4 style="color: #ffffff;">Slicing a DataFrame with a boolean mask</h4>
<p style="color: #cccccc;"></p>
</div>

In [92]:
masked_df = short_df[short_df['column1'].between(7,9)]

In [93]:
masked_df

Unnamed: 0,column1,column2
row9,7.0,12.0
row10,7.25,12.25
row11,7.5,12.5
row12,7.75,12.75
row13,8.0,13.0


<div style="background-color: #000054; padding: 25px;">
<h1 style="color: #ffffff;">Thank you Very Much!</h1>
</div>

<div>
    <table style="width:100%">
      <tr>
        <td><img width="64" src="https://s3-us-west-2.amazonaws.com/schellenbergers3bucket/GitHub-Mark-120px-plus.png"></td>
        <td><h2>github.com/cschellenberger</h2></td> 
      </tr>
      <tr>
        <td><img width="64" src="https://s3-us-west-2.amazonaws.com/schellenbergers3bucket/In-Blue-72%402x.png"></td>
        <td><h2>linkedin.com/in/codyschellenberger</h2></td> 
      </tr>
    </table>
</div>