#**Dora**

Dora is designed for exploratory analysis.

The library contains convenience functions for data cleaning, feature selection & extraction, visualization, partitioning data for model validation, and versioning transformations of data.

The library uses and is intended to be a helpful addition to common Python data analysis tools such as pandas, scikit-learn, and matplotlib.




**Cleansing functions include:**

1. Reading data with missing and poorly scaled values
2. Imputing missing values
3. Scaling values of input variables


#**Installation**

In [1]:
!pip install dora

Collecting dora
  Downloading Dora-0.0.3.tar.gz (4.9 kB)
Collecting sklearn
  Using cached sklearn-0.0.tar.gz (1.1 kB)
Building wheels for collected packages: dora, sklearn
  Building wheel for dora (setup.py): started
  Building wheel for dora (setup.py): finished with status 'done'
  Created wheel for dora: filename=Dora-0.0.3-py3-none-any.whl size=3364 sha256=54332bcc0ee08f8ae4a511b9f56b53d82d5c6c3d653582716467fedb69cefb37
  Stored in directory: c:\users\aabha gupta\appdata\local\pip\cache\wheels\e5\01\df\30896006ee88f3f23ebe4960474a8df25958eeacc1ca26d8bd
  Building wheel for sklearn (setup.py): started
  Building wheel for sklearn (setup.py): finished with status 'done'
  Created wheel for sklearn: filename=sklearn-0.0-py2.py3-none-any.whl size=1316 sha256=1efdf5e0cdfc7e0cb3e1cb727304b46c930fc90a5f362c1125a0264cca62aaec
  Stored in directory: c:\users\aabha gupta\appdata\local\pip\cache\wheels\22\0b\40\fd3f795caaa1fb4c6cb738bc1f56100be1e57da95849bfc897
Successfully built dora sklea

# **Import and Initialize Dora**

In [2]:
from Dora import Dora
dora = Dora()  #Intialize dora

In [4]:
import pandas as pd

#let's load the titanic dataset
df = pd.read_csv('C:/Users/Aabha Gupta/Downloads/workbench-dataset/train.txt')

In [5]:
df.head()

Unnamed: 0,PassengerId,Survived,Pclass,Name,Sex,Age,SibSp,Parch,Ticket,Fare,Cabin,Embarked
0,1,0,3,"Braund, Mr. Owen Harris",male,22.0,1,0,A/5 21171,7.25,,S
1,2,1,1,"Cumings, Mrs. John Bradley (Florence Briggs Th...",female,38.0,1,0,PC 17599,71.2833,C85,C
2,3,1,3,"Heikkinen, Miss. Laina",female,26.0,0,0,STON/O2. 3101282,7.925,,S
3,4,1,1,"Futrelle, Mrs. Jacques Heath (Lily May Peel)",female,35.0,1,0,113803,53.1,C123,S
4,5,0,3,"Allen, Mr. William Henry",male,35.0,0,0,373450,8.05,,S


In this dataset,we will be using ***Survived*** as our output or target variable. We will then pass it to dora to create a dora dataset and define the target variable.

In [6]:
dora.configure(output = 'Survived', data = df)
dora.data

Unnamed: 0,PassengerId,Survived,Pclass,Name,Sex,Age,SibSp,Parch,Ticket,Fare,Cabin,Embarked
0,1,0,3,"Braund, Mr. Owen Harris",male,22.0,1,0,A/5 21171,7.2500,,S
1,2,1,1,"Cumings, Mrs. John Bradley (Florence Briggs Th...",female,38.0,1,0,PC 17599,71.2833,C85,C
2,3,1,3,"Heikkinen, Miss. Laina",female,26.0,0,0,STON/O2. 3101282,7.9250,,S
3,4,1,1,"Futrelle, Mrs. Jacques Heath (Lily May Peel)",female,35.0,1,0,113803,53.1000,C123,S
4,5,0,3,"Allen, Mr. William Henry",male,35.0,0,0,373450,8.0500,,S
...,...,...,...,...,...,...,...,...,...,...,...,...
886,887,0,2,"Montvila, Rev. Juozas",male,27.0,0,0,211536,13.0000,,S
887,888,1,1,"Graham, Miss. Margaret Edith",female,19.0,0,0,112053,30.0000,B42,S
888,889,0,3,"Johnston, Miss. Catherine Helen ""Carrie""",female,,1,2,W./C. 6607,23.4500,,S
889,890,1,1,"Behr, Mr. Karl Howell",male,26.0,0,0,111369,30.0000,C148,C


###Remove some features

In [7]:
dora.remove_feature('PassengerId')

In [8]:

dora.data

Unnamed: 0,Survived,Pclass,Name,Sex,Age,SibSp,Parch,Ticket,Fare,Cabin,Embarked
0,0,3,"Braund, Mr. Owen Harris",male,22.0,1,0,A/5 21171,7.2500,,S
1,1,1,"Cumings, Mrs. John Bradley (Florence Briggs Th...",female,38.0,1,0,PC 17599,71.2833,C85,C
2,1,3,"Heikkinen, Miss. Laina",female,26.0,0,0,STON/O2. 3101282,7.9250,,S
3,1,1,"Futrelle, Mrs. Jacques Heath (Lily May Peel)",female,35.0,1,0,113803,53.1000,C123,S
4,0,3,"Allen, Mr. William Henry",male,35.0,0,0,373450,8.0500,,S
...,...,...,...,...,...,...,...,...,...,...,...
886,0,2,"Montvila, Rev. Juozas",male,27.0,0,0,211536,13.0000,,S
887,1,1,"Graham, Miss. Margaret Edith",female,19.0,0,0,112053,30.0000,B42,S
888,0,3,"Johnston, Miss. Catherine Helen ""Carrie""",female,,1,2,W./C. 6607,23.4500,,S
889,1,1,"Behr, Mr. Karl Howell",male,26.0,0,0,111369,30.0000,C148,C


### Perform **one-hot encoding**

In [9]:

dora.data.Embarked.unique()

array(['S', 'C', 'Q', nan], dtype=object)

In [10]:
dora.extract_ordinal_feature('Embarked')

In [11]:

dora.data

Unnamed: 0,Survived,Pclass,Name,Sex,Age,SibSp,Parch,Ticket,Fare,Cabin,Embarked=C,Embarked=Q,Embarked=S,Embarked=nan
0,0,3,"Braund, Mr. Owen Harris",male,22.0,1,0,A/5 21171,7.2500,,0.0,0.0,1.0,0.0
1,1,1,"Cumings, Mrs. John Bradley (Florence Briggs Th...",female,38.0,1,0,PC 17599,71.2833,C85,1.0,0.0,0.0,0.0
2,1,3,"Heikkinen, Miss. Laina",female,26.0,0,0,STON/O2. 3101282,7.9250,,0.0,0.0,1.0,0.0
3,1,1,"Futrelle, Mrs. Jacques Heath (Lily May Peel)",female,35.0,1,0,113803,53.1000,C123,0.0,0.0,1.0,0.0
4,0,3,"Allen, Mr. William Henry",male,35.0,0,0,373450,8.0500,,0.0,0.0,1.0,0.0
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
886,0,2,"Montvila, Rev. Juozas",male,27.0,0,0,211536,13.0000,,0.0,0.0,1.0,0.0
887,1,1,"Graham, Miss. Margaret Edith",female,19.0,0,0,112053,30.0000,B42,0.0,0.0,1.0,0.0
888,0,3,"Johnston, Miss. Catherine Helen ""Carrie""",female,,1,2,W./C. 6607,23.4500,,0.0,0.0,1.0,0.0
889,1,1,"Behr, Mr. Karl Howell",male,26.0,0,0,111369,30.0000,C148,1.0,0.0,0.0,0.0


###Feature Transformation

In [12]:
dora.extract_feature('Age', 'New Age', lambda x: x/2)

In [13]:
#@title
dora.data.head()

Unnamed: 0,Survived,Pclass,Name,Sex,Age,SibSp,Parch,Ticket,Fare,Cabin,Embarked=C,Embarked=Q,Embarked=S,Embarked=nan,New Age
0,0,3,"Braund, Mr. Owen Harris",male,22.0,1,0,A/5 21171,7.25,,0.0,0.0,1.0,0.0,11.0
1,1,1,"Cumings, Mrs. John Bradley (Florence Briggs Th...",female,38.0,1,0,PC 17599,71.2833,C85,1.0,0.0,0.0,0.0,19.0
2,1,3,"Heikkinen, Miss. Laina",female,26.0,0,0,STON/O2. 3101282,7.925,,0.0,0.0,1.0,0.0,13.0
3,1,1,"Futrelle, Mrs. Jacques Heath (Lily May Peel)",female,35.0,1,0,113803,53.1,C123,0.0,0.0,1.0,0.0,17.5
4,0,3,"Allen, Mr. William Henry",male,35.0,0,0,373450,8.05,,0.0,0.0,1.0,0.0,17.5


Check the Dora [documentation](https://github.com/NathanEpstein/Dora) to learn more