## Okay, we have our Series and GroupBy objects.

Now we will use them in an object that helps us modify and analyze data.

In [25]:
import csv 
from phoenixcel.dataframe import Series, GroupBy

class DataFrame():
    def __init__(self):
        self._dictionary = {}
        self._list = []
    
    # Ways to create an instance
    @classmethod
    def from_csv(cls, file_path):
        df = cls()
        header_unread = True                
        
        with open(file_path) as f:
            reader = csv.DictReader(f)
            
            for row in reader:
                if header_unread:
                    for key in row.keys():
                        df._dictionary[key] = Series()
                    header_unread = False
                else:
                    df._list.append(row)
                    
                for key in row.keys():
                    df._dictionary[key].append(row[key])
            
            for key in list(df._dictionary.keys()):
                setattr(df, key.lower().replace(" ", "_"), df._dictionary[key])
        return df
    
    @classmethod
    def from_rows(cls, rows): #[]
        df = cls()
        
        for key in rows[0].keys():
            df._dictionary[key] = Series()
        for row in rows:
            for key in rows[0].keys():
                df._dictionary[key].append(row[key])
            df._list.append(row)
                
        for key in list(df._dictionary.keys()):
            setattr(df, key.lower().replace(" ", "_"), df._dictionary[key])

        return df

    @classmethod
    def from_dictionary(cls, dictionary):
        df = cls()

        df._dictionary = dictionary
        for i in range(len(dictionary[list(dictionary.keys())[0]])):
            item = {}
            for key in dictionary.keys():
                item[key] = dictionary[key][i]
            df._list.append(item)
                
        for key in list(df._dictionary.keys()):
            setattr(df, key.lower().replace(" ", "_"), df._dictionary[key])

        return df

    # Properties
    @property
    def shape(self):
        return \
        len(self._dictionary.keys()), \
        len(self._dictionary[list(self._dictionary.keys())[0]])
    
    @property
    def columns(self):
        return list(self._dictionary.keys())
    
    # Methods for getting a column in the dictionary
    def __getitem__(self, item):
        '''
        Get a reference to a column in the dataframe.
        
        Input:
          item - the column header
        
        Output:
          the column, which is a series
        
        Modifies:
          Nothing
        '''
        return self._dictionary[item]
    
    # Method for setting a column in the dictionary
    def __setitem__(self, key, value):
        '''
        Set a new column in the dataframe.
        
        Inputs:
          key - the column header
          value - the column (as a Series for consistency, please)
          
        Outputs:
          None
        
        Modifies:
          Modifies the dataframe object in place.
        '''
        self._dictionary[key] = value
        for index, item in enumerate(self._list):
            item[key] = value[index]
           
        setattr(self, key.lower().replace(" ", "_"), self._dictionary[key])
        
    def where(self, condition):
        df = self
        rows = [row for row in self._list if condition(row)]
        print(f"Result of printing rows: {rows}")
        return DataFrame.from_rows(rows)
    
    def group_by(self, column):
        '''
        Returns an object that aggregates the items in the dataframe
        based on one value that they have in common,
        similar to a pivot table in the software to which
        phoenixcell's name pays tribute (Please don't sue me, Microsoft)
        
        Inputs:
          column - the column on whose value the items should be grouped
          
        Outputs:
          A new GroupBy() object
        
        Modifies:
          Nothing
        '''
        groups = GroupBy()
        for item in self._list:
            maybe_unique_column_value = item[column]
            if maybe_unique_column_value in groups.keys():
                groups[maybe_unique_column_value].append(item)
            else:
                groups[maybe_unique_column_value] = Series()
                groups[maybe_unique_column_value].append(item)
        return groups

### Challenge: Study this code and answer some questions about it.

1. The `DataFrame` has three class methods on it. What do these class methods do? Why do you think that they are class methods as opposed to instance methods?

2. The `DataFrame` has a `_dictionary` and a `_list` on it. Why does it need each of these? Take a look at the methods to figure out why and how these are needed.

3. The `DataFrame` has a `_dictionary` and a `_list`, but it does not inherit from either of those classes. Why do you think this is?

4. There's some lines in the constructor and the `__setitem__` method like:

```
for key in list(df._dictionary.keys()):
  setattr(df, key.lower().replace(" ", "_"), df._dictionary[key])
```

What do those lines do?

Here's a dataframe of the `birds` data for you to try out the methods on:

In [26]:
df = DataFrame.from_csv('birds.csv')

(Hint: try printing out `df._list` or `df._dictionary` to understand what's in there!)

In [27]:
df._dictionary

{'species': ['oriole',
  'oriole',
  'oriole',
  'oriole',
  'oriole',
  'oriole',
  'oriole',
  'blue jay',
  'blue jay',
  'blue jay',
  'blue jay',
  'blue jay',
  'blue jay',
  'blue jay',
  'titmouse',
  'titmouse',
  'titmouse',
  'titmouse',
  'titmouse',
  'titmouse',
  'titmouse'],
 'specimen_id': ['7ss24g6t7f2dr4h32',
  '7dr4h32ss24g6t7f2',
  'g6t7f2dr4h327ss24',
  't77ss24g6f2dr4h32',
  '327ss24g6t7f2dr4h',
  '6t7ss24g5f2dr4h32',
  '2f2dr4h37ss24g6t7',
  '88Jnnb323es29bs2f',
  '3d329bs2f24g6t7f2',
  'g6t3f2dr4h322ss24',
  'f2dr4t76ss24g6h32',
  't7f2312ss24g6dr4h',
  '5f26t1ss24gdr4h32',
  '9f237ss24g6t8dr4h',
  '1sn32ufks82d92b39',
  '8sh2bdn4s24g6t7f2',
  'h38snsdr4h327ss24',
  'dr4h3224g6tg5f2dr',
  '32bf72f9m27f2dr4h',
  '2b47f29fn34h47dn3',
  't77ss24g6f27s41md'],
 'weight': ['4.23',
  '4.17',
  '5.21',
  '3.22',
  '4.00',
  '3.98',
  '5.00',
  '5.23',
  '6.32',
  '5.21',
  '4.85',
  '4.91',
  '5.69',
  '5.22',
  '2.13',
  '3.10',
  '2.21',
  '2.22',
  '3.00',
  '2.98',

### Challenge: Write a docstring for the `where` method.

Be sure to include:

1. A summary of what it does
2. What it takes as arguments
3. What it returns
4. Whether it changes any objects outside the scope of the function itself!

In [22]:
orioles_only = df.where(lambda row: row['species'] == "oriole")

Result of printing rows: [{'species': 'oriole', 'specimen_id': '7dr4h32ss24g6t7f2', 'weight': '4.17'}, {'species': 'oriole', 'specimen_id': 'g6t7f2dr4h327ss24', 'weight': '5.21'}, {'species': 'oriole', 'specimen_id': 't77ss24g6f2dr4h32', 'weight': '3.22'}, {'species': 'oriole', 'specimen_id': '327ss24g6t7f2dr4h', 'weight': '4.00'}, {'species': 'oriole', 'specimen_id': '6t7ss24g5f2dr4h32', 'weight': '3.98'}, {'species': 'oriole', 'specimen_id': '2f2dr4h37ss24g6t7', 'weight': '5.00'}]


In [23]:
orioles_only._dictionary

{'species': ['oriole', 'oriole', 'oriole', 'oriole', 'oriole', 'oriole'],
 'specimen_id': ['7dr4h32ss24g6t7f2',
  'g6t7f2dr4h327ss24',
  't77ss24g6f2dr4h32',
  '327ss24g6t7f2dr4h',
  '6t7ss24g5f2dr4h32',
  '2f2dr4h37ss24g6t7'],
 'weight': ['4.17', '5.21', '3.22', '4.00', '3.98', '5.00']}

### Challenge: What happens when you call the `where` function with a conditon that returns no results?

In [28]:
humans_df = df.where(lambda row: row['species'] == "human") 

Result of printing rows: []


In [29]:
humans_df._dictionary

{}

Is that what you _expect_ should happen? If not, what _should_ happen?

Amend the implementation of `DataFrame` to resolve the issue.

**Is there any other similar function in DataFrame that might have the same issue? Which one is it?**