# Fetching and filtering data from the database
    
When using the dices-client you will be able to manipulate data by using predefined methods to parse and manipulate data.
    
You will be able to use DataGroups more effectively with reading this notebook. This document will go over the datatypes that are included in the client for you to better understand how to efficiently parse the data given from the database.

## Preliminaries

First, we have to create a connection to the DICES API:

In [None]:
import dicesapi
from dicesapi import DicesAPI
from dicesapi.jupyter import NotebookPBar

api = DicesAPI(
    dices_api = 'https://fierce-ravine-99183.herokuapp.com/api/',
    cts_api = 'http://cts.perseids.org/api/cts/',
    progress_class = NotebookPBar,
)

## Datagroups
    
All methods that return a group of Datatypes will return a Datagroup.

e.g. `.getSpeeches()` would return a DataGroup:

In [None]:
speeches = api.getSpeeches(spkr_name='Achilles', progress=True)

In [None]:
isinstance(speeches, dicesapi._DataGroup)

There are Datagroups for each data class that the API provides (EG: CharacterGroup, SpeechGroup, CharacterInstanceGroup etc...) and each method that fetches data yields its respective Datagroup child (`.getSpeeches()` => SpeechGroup)

In [None]:
type(speeches)

These DataGroups have plenty of built in functionality to them that allows them to be manipulated.
    
## List-like behaviour
 
In essence, DataGroups can be treated similarly to lists, where you can fetch data from them in similar ways to lists.

### Fetching an element of the datagroup

In [None]:
print('first speech: ' + str(speeches[0]))
print('sixth speech: ' + str(speeches[5]))

### Iterating over the DataGroup

In [None]:
for i, s in enumerate(speeches[:5]):
    print(f'item {i}: {s}')

### Get number of elements in the DataGroup

In [None]:
print(len(speeches))

## Operators

### Addition

Addition allows you to concatenate two DataGroups together. This method will remove duplicate values. Addition creates a brand new DataGroup consisting of elements of both datagroups.

In [None]:
ach_spkr = api.getSpeeches(spkr_name='Achilles', work_title='Iliad')
ach_addr = api.getSpeeches(addr_name='Achilles', work_title='Iliad')

print(f'Achilles is speaker in {len(ach_spkr)} speeches in the Iliad.')
print(f'Achilles is addressee in {len(ach_addr)} speeches in the Iliad.')

all_ach = ach_spkr + ach_addr
print(f'Achilles is involved in {len(all_ach)} speeches in the Iliad (in any role).')

**Notes**

- If you want to keep duplicates use `.extend(otherDataGroup, duplicates=True)`

- If you want to add group2 directly to group1, use: `group1 += group2`

### Subtraction

Subtraction creates a new DataGroup that takes the elements from DataGroup1 and excludes any that are a part of DataGroup2.

In [None]:
# speeches where Ach is not speaking to himself

ach_to_others = ach_spkr - ach_addr
print(len(ach_to_others))

**Notes**

- If you want to modify group1 directly instead of making a copy, use: `group1 -= group2`

## Methods

### General Purpose

#### `extend(DataGroup, duplicates=False)`

Extend acts similarly to `+=`, however it allows the user to specify if duplicate values should be left in the list. In most cases `+` should be used but if duplicates are needed extend must be used.

#### `intersect(DataGroup)`

Intersect takes the elements from two DataGroups and returns a new DataGroup that only contains elements that were shared in both DataGroups

In [None]:
# only speeches where Achilles talks to himself
ach_self = ach_spkr.intersect(ach_addr)

print(len(ach_self))

### Filters

Filters are methods that allows users to take DataGroups and find elements that have specific values attached to them. Every DataGroup child has their own specific filters, however there are some that are shared between all DataGroups.
    
To learn more about filters check the "Using and Understanding Filters" notebook

#### `filterAttribute(attribute, value)`

filterAttribute allows users to search through all the elements of a DataGroup and filters them based on an inputted attribute and matching value. This method only checks for literal values (This includes Strings, integers and floats)

In [None]:
# only speeches that initiate a conversation
filtered = ach_spkr.filterAttribute('part', 1)
print(len(filtered))

#### `filterList(attribute, list)`

filterList allows users to search through all the elements of a DataGroup and filters them based on an inputted attribute, and then checks if the value at this attribute is contained withing a supplied list. This allows users to include multiple values in their filter

In [None]:
# only speeches that represent an even-numbered turn
filtered = ach_spkr.filterList('part', [2, 4, 6, 8, 10, 12])
print(len(filtered))

#### `advancedFilter(func)`

advancedFilter allows users to provide their own filter function to filter all of the things that are in the given data, these filters can be anything as long as they return a boolean values and have a single argument which will be an element of the DataGroup. For plenty of examples of the usage of this method, please check the "Using and Understanding Filters" notebook

In [None]:
# a better way to check even numbered turns,
# especially if you're going to use it more than once

def filterFunction(speech):
    '''check whether speech is even-numbered turn'''
    return speech.part % 2 == 0

filtered = ach_spkr.advancedFilter(filterFunction)
print(len(filtered))

In [None]:
# for one-offs, use lambda

filtered = ach_spkr.advancedFilter(lambda s: s.part % 2 == 0)
print(len(filtered))