# Getting started in 30 seconds

Let's import `diffprivlib` and another few utilities that will be useful for this 30-second example.

In [1]:
import diffprivlib as dpl
from sklearn import datasets
from sklearn.model_selection import train_test_split

For this example, let's load the Iris dataset and perform an 80/20 train/test split.

In [4]:
dataset = datasets.load_iris()

X_train, X_test, y_train, y_test = train_test_split(dataset.data, dataset.target, test_size=0.2)

Now, let's train a differentially private naive Bayes classifier and test its accuracy. `dpl.models.GaussianNB` can be run __without any parameters__, although strictly speaking, this is undesirable and will throw a warning (check out other notebooks for more details).

In [15]:
clf = dpl.models.GaussianNB()
clf.fit(X_train, y_train)



GaussianNB(bounds=[(4.3, 7.9), (2.2, 4.4), (1.0, 6.9), (0.1, 2.5)],
      epsilon=None, priors=None, var_smoothing=1e-09)

We can now classify unseen examples, knowing that the trained model is differentially private and preserves the privacy of the 'individuals' of the training dataset (flowers are entitled to their privacy too!).

In [16]:
clf.predict(X_test)

array([0, 1, 1, 0, 1, 0, 0, 0, 0, 1, 1, 1, 1, 1, 2, 1, 2, 1, 0, 2, 0, 1,
       0, 1, 1, 0, 0, 0, 1, 0])

The accuracy of the model will change if the model is retrained with the same training data. This is due to the randomness of differential privacy. Try it for yourself to find out!

In [17]:
(clf.predict(X_test) == y_test).sum() / y_test.shape[0]

0.8

Congratulations! You're run your first differentially private data analysis with the Differential Privacy Library!