# Handling bad, missing, and duplicate data

**Abid Ali**
Skype: Abd.Soft
Email: [abdsoftfsd@gmail.com](mailto:abdsoftfsd@gmail.com)
Github: [github.com/abid-2362](https://github.com/abid-2362)

## Cleaning Bad Data
- What is "bad" data?
- Define your goal
- Drop, fill, or replace

In [2]:
import pandas as pd


In [3]:
data = pd.read_csv('data/artwork_data.csv', low_memory=False)
data.head()

Unnamed: 0,id,accession_number,artist,artistRole,artistId,title,dateText,medium,creditLine,year,acquisitionYear,dimensions,width,height,depth,units,inscription,thumbnailCopyright,thumbnailUrl,url
0,1035,A00001,"Blake, Robert",artist,38,A Figure Bowing before a Seated Old Man with h...,date not known,"Watercolour, ink, chalk and graphite on paper....",Presented by Mrs John Richmond 1922,,1922.0,support: 394 x 419 mm,394,419,,mm,,,http://www.tate.org.uk/art/images/work/A/A00/A...,http://www.tate.org.uk/art/artworks/blake-a-fi...
1,1036,A00002,"Blake, Robert",artist,38,"Two Drawings of Frightened Figures, Probably f...",date not known,Graphite on paper,Presented by Mrs John Richmond 1922,,1922.0,support: 311 x 213 mm,311,213,,mm,,,http://www.tate.org.uk/art/images/work/A/A00/A...,http://www.tate.org.uk/art/artworks/blake-two-...
2,1037,A00003,"Blake, Robert",artist,38,The Preaching of Warning. Verso: An Old Man En...,?c.1785,Graphite on paper. Verso: graphite on paper,Presented by Mrs John Richmond 1922,1785.0,1922.0,support: 343 x 467 mm,343,467,,mm,,,http://www.tate.org.uk/art/images/work/A/A00/A...,http://www.tate.org.uk/art/artworks/blake-the-...
3,1038,A00004,"Blake, Robert",artist,38,Six Drawings of Figures with Outstretched Arms,date not known,Graphite on paper,Presented by Mrs John Richmond 1922,,1922.0,support: 318 x 394 mm,318,394,,mm,,,http://www.tate.org.uk/art/images/work/A/A00/A...,http://www.tate.org.uk/art/artworks/blake-six-...
4,1039,A00005,"Blake, William",artist,39,The Circle of the Lustful: Francesca da Rimini...,"1826–7, reprinted 1892",Line engraving on paper,Purchased with the assistance of a special gra...,1826.0,1919.0,image: 243 x 335 mm,243,335,,mm,,,http://www.tate.org.uk/art/images/work/A/A00/A...,http://www.tate.org.uk/art/artworks/blake-the-...


- Strip white space
- Replace bad data
- Fill missing data
- Drop bad data
- Drop duplicate data

## Stripping White Space


In [4]:
data.loc[data.title.str.contains('\s$', regex=True), 'title']


49498    Port-Distinguishing Letters of Scottish Fishin...
50534                                                Boys 
50535                                           Buildings 
50537                                    Pray for Josquin 
53186                              Untitled (Safety Pins) 
56283    Towards a definitive statement on the coming t...
67409                          2 Standard Ware Mead Cups  
67432                                               Bowl  
Name: title, dtype: object

In [5]:
data.title.str.strip()


0        A Figure Bowing before a Seated Old Man with h...
1        Two Drawings of Frightened Figures, Probably f...
3           Six Drawings of Figures with Outstretched Arms
4        The Circle of the Lustful: Francesca da Rimini...
                               ...                        
69196                          Larvae (from Tampax Romana)
69197                     Living Womb (from Tampax Romana)
69198                                        Present Tense
69199            Work No. 227: The lights going on and off
69200                     Dancing Scene in the West Indies
Name: title, Length: 69201, dtype: object

In [6]:
data.loc[data.title.str.contains('\s$', regex=True), 'title']


49498    Port-Distinguishing Letters of Scottish Fishin...
50534                                                Boys 
50535                                           Buildings 
50537                                    Pray for Josquin 
53186                              Untitled (Safety Pins) 
56283    Towards a definitive statement on the coming t...
67409                          2 Standard Ware Mead Cups  
67432                                               Bowl  
Name: title, dtype: object

In [7]:
data.title = data.title.str.strip()


In [8]:
data.loc[data.title.str.contains('\s$', regex=True), 'title']


Series([], Name: title, dtype: object)

In [9]:
# Right and left strip
# data.title = data.title.str.rstrip()
# data.title = data.title.str.lstrip()


In [10]:
# stripping by lambda function
data.title.transform(lambda x: x.strip())


0        A Figure Bowing before a Seated Old Man with h...
1        Two Drawings of Frightened Figures, Probably f...
3           Six Drawings of Figures with Outstretched Arms
4        The Circle of the Lustful: Francesca da Rimini...
                               ...                        
69196                          Larvae (from Tampax Romana)
69197                     Living Womb (from Tampax Romana)
69198                                        Present Tense
69199            Work No. 227: The lights going on and off
69200                     Dancing Scene in the West Indies
Name: title, Length: 69201, dtype: object

In [11]:
# Replacing Bad Data with NaN
original_data = pd.read_csv('data/artwork_data.csv',low_memory=False)
data = original_data.copy()

In [12]:
data[['title','dateText']].head()


Unnamed: 0,title,dateText
0,A Figure Bowing before a Seated Old Man with h...,date not known
1,"Two Drawings of Frightened Figures, Probably f...",date not known
2,The Preaching of Warning. Verso: An Old Man En...,?c.1785
3,Six Drawings of Figures with Outstretched Arms,date not known
4,The Circle of the Lustful: Francesca da Rimini...,"1826–7, reprinted 1892"


In [13]:
# data.loc[pd.isna(data.loc[:, 'dateText']) == True]
pd.isna(data.loc[:, 'dateText'])

0        False
1        False
2        False
3        False
4        False
         ...  
69196    False
69197    False
69198    False
69199    False
69200    False
Name: dateText, Length: 69201, dtype: bool

In [14]:
from numpy import nan


In [15]:
# replace date not known in dateText series with NaN values
data.replace({'dateText': {'date not known': nan}})


Unnamed: 0,id,accession_number,artist,artistRole,artistId,title,dateText,medium,creditLine,year,acquisitionYear,dimensions,width,height,depth,units,inscription,thumbnailCopyright,thumbnailUrl,url
0,1035,A00001,"Blake, Robert",artist,38,A Figure Bowing before a Seated Old Man with h...,,"Watercolour, ink, chalk and graphite on paper....",Presented by Mrs John Richmond 1922,,1922.0,support: 394 x 419 mm,394,419,,mm,,,http://www.tate.org.uk/art/images/work/A/A00/A...,http://www.tate.org.uk/art/artworks/blake-a-fi...
1,1036,A00002,"Blake, Robert",artist,38,"Two Drawings of Frightened Figures, Probably f...",,Graphite on paper,Presented by Mrs John Richmond 1922,,1922.0,support: 311 x 213 mm,311,213,,mm,,,http://www.tate.org.uk/art/images/work/A/A00/A...,http://www.tate.org.uk/art/artworks/blake-two-...
2,1037,A00003,"Blake, Robert",artist,38,The Preaching of Warning. Verso: An Old Man En...,?c.1785,Graphite on paper. Verso: graphite on paper,Presented by Mrs John Richmond 1922,1785,1922.0,support: 343 x 467 mm,343,467,,mm,,,http://www.tate.org.uk/art/images/work/A/A00/A...,http://www.tate.org.uk/art/artworks/blake-the-...
3,1038,A00004,"Blake, Robert",artist,38,Six Drawings of Figures with Outstretched Arms,,Graphite on paper,Presented by Mrs John Richmond 1922,,1922.0,support: 318 x 394 mm,318,394,,mm,,,http://www.tate.org.uk/art/images/work/A/A00/A...,http://www.tate.org.uk/art/artworks/blake-six-...
4,1039,A00005,"Blake, William",artist,39,The Circle of the Lustful: Francesca da Rimini...,"1826–7, reprinted 1892",Line engraving on paper,Purchased with the assistance of a special gra...,1826,1919.0,image: 243 x 335 mm,243,335,,mm,,,http://www.tate.org.uk/art/images/work/A/A00/A...,http://www.tate.org.uk/art/artworks/blake-the-...
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
69196,122960,T13865,"P-Orridge, Genesis",artist,16646,Larvae (from Tampax Romana),1975,"Perspex, Wood, hairpiece, tampon and human blood",Transferred from Tate Archive 2012,1975,2013.0,object: 305 x 305 x 135 mm,305,305,135.0,mm,,,,http://www.tate.org.uk/art/artworks/p-orridge-...
69197,122961,T13866,"P-Orridge, Genesis",artist,16646,Living Womb (from Tampax Romana),1976,"Wood, Perspex, plastic, photograph on paper, t...",Transferred from Tate Archive 2012,1976,2013.0,object: 305 x 305 x 135 mm,305,305,135.0,mm,,,,http://www.tate.org.uk/art/artworks/p-orridge-...
69198,121181,T13867,"Hatoum, Mona",artist,2365,Present Tense,1996,Soap and glass beads,Presented by Tate Members 2013,1996,2013.0,displayed: 45 x 2410 x 2990 mm,45,2410,2990.0,mm,,,,http://www.tate.org.uk/art/artworks/hatoum-pre...
69199,112306,T13868,"Creed, Martin",artist,2760,Work No. 227: The lights going on and off,2000,Gallery lighting,"Purchased with funds provided by Tate Members,...",2000,2013.0,Overall display dimensions variable,,,,,,,,http://www.tate.org.uk/art/artworks/creed-work...


In [16]:
data[['title', 'dateText']].head()


Unnamed: 0,title,dateText
0,A Figure Bowing before a Seated Old Man with h...,date not known
1,"Two Drawings of Frightened Figures, Probably f...",date not known
2,The Preaching of Warning. Verso: An Old Man En...,?c.1785
3,Six Drawings of Figures with Outstretched Arms,date not known
4,The Circle of the Lustful: Francesca da Rimini...,"1826–7, reprinted 1892"


In [17]:
# inplace = True to change the original data
data.replace({'dateText': {'date not known': nan}}, inplace=True)
data[['title', 'dateText']].head()


Unnamed: 0,title,dateText
0,A Figure Bowing before a Seated Old Man with h...,
1,"Two Drawings of Frightened Figures, Probably f...",
2,The Preaching of Warning. Verso: An Old Man En...,?c.1785
3,Six Drawings of Figures with Outstretched Arms,
4,The Circle of the Lustful: Francesca da Rimini...,"1826–7, reprinted 1892"


In [18]:
# reload unmodified data again
data = original_data.copy()


In [19]:
# set NaN where dateText value is 'date not known'
data.loc[data.dateText == 'date not known', 'dateText'] = nan


In [20]:
data[['title', 'dateText']].head()


Unnamed: 0,title,dateText
0,A Figure Bowing before a Seated Old Man with h...,
1,"Two Drawings of Frightened Figures, Probably f...",
2,The Preaching of Warning. Verso: An Old Man En...,?c.1785
3,Six Drawings of Figures with Outstretched Arms,
4,The Circle of the Lustful: Francesca da Rimini...,"1826–7, reprinted 1892"


In [21]:
data.loc[data.year.notnull() & data.year.astype(str).str.contains('[^0-9]', regex=True), ['year']]


Unnamed: 0,year
67968,no date
67980,no date
67987,no date
67994,no date
68002,no date
68015,no date
68033,no date
68042,no date
68047,no date
68051,no date


In [22]:
# set bad values to NaN
data.loc[data.year.notnull() & data.year.astype(str).str.contains('[^0-9]', regex=True), ['year']] = nan


In [23]:
data['year'].head()


0     NaN
1     NaN
2    1785
3     NaN
4    1826
Name: year, dtype: object

In [24]:
data.iloc[69165:69166]


Unnamed: 0,id,accession_number,artist,artistRole,artistId,title,dateText,medium,creditLine,year,acquisitionYear,dimensions,width,height,depth,units,inscription,thumbnailCopyright,thumbnailUrl,url
69165,121190,T13834,"Barlow, Phyllida",artist,10908,Untitled,c.1997–9,Acrylic paint on paper,Presented by the Trustees of the Chantrey Bequ...,,2013.0,unconfirmed: 389 x 562 mm,389,562,,mm,,© Phyllida Barlow,http://www.tate.org.uk/art/images/work/T/T13/T...,http://www.tate.org.uk/art/artworks/barlow-unt...


# Filling missing data with a value


In [25]:
data = original_data.copy()
data.head()

Unnamed: 0,id,accession_number,artist,artistRole,artistId,title,dateText,medium,creditLine,year,acquisitionYear,dimensions,width,height,depth,units,inscription,thumbnailCopyright,thumbnailUrl,url
0,1035,A00001,"Blake, Robert",artist,38,A Figure Bowing before a Seated Old Man with h...,date not known,"Watercolour, ink, chalk and graphite on paper....",Presented by Mrs John Richmond 1922,,1922.0,support: 394 x 419 mm,394,419,,mm,,,http://www.tate.org.uk/art/images/work/A/A00/A...,http://www.tate.org.uk/art/artworks/blake-a-fi...
1,1036,A00002,"Blake, Robert",artist,38,"Two Drawings of Frightened Figures, Probably f...",date not known,Graphite on paper,Presented by Mrs John Richmond 1922,,1922.0,support: 311 x 213 mm,311,213,,mm,,,http://www.tate.org.uk/art/images/work/A/A00/A...,http://www.tate.org.uk/art/artworks/blake-two-...
2,1037,A00003,"Blake, Robert",artist,38,The Preaching of Warning. Verso: An Old Man En...,?c.1785,Graphite on paper. Verso: graphite on paper,Presented by Mrs John Richmond 1922,1785.0,1922.0,support: 343 x 467 mm,343,467,,mm,,,http://www.tate.org.uk/art/images/work/A/A00/A...,http://www.tate.org.uk/art/artworks/blake-the-...
3,1038,A00004,"Blake, Robert",artist,38,Six Drawings of Figures with Outstretched Arms,date not known,Graphite on paper,Presented by Mrs John Richmond 1922,,1922.0,support: 318 x 394 mm,318,394,,mm,,,http://www.tate.org.uk/art/images/work/A/A00/A...,http://www.tate.org.uk/art/artworks/blake-six-...
4,1039,A00005,"Blake, William",artist,39,The Circle of the Lustful: Francesca da Rimini...,"1826–7, reprinted 1892",Line engraving on paper,Purchased with the assistance of a special gra...,1826.0,1919.0,image: 243 x 335 mm,243,335,,mm,,,http://www.tate.org.uk/art/images/work/A/A00/A...,http://www.tate.org.uk/art/artworks/blake-the-...


In [26]:
# fill all missing values with 0 (might not be the required case)
data.fillna(0)


Unnamed: 0,id,accession_number,artist,artistRole,artistId,title,dateText,medium,creditLine,year,acquisitionYear,dimensions,width,height,depth,units,inscription,thumbnailCopyright,thumbnailUrl,url
0,1035,A00001,"Blake, Robert",artist,38,A Figure Bowing before a Seated Old Man with h...,date not known,"Watercolour, ink, chalk and graphite on paper....",Presented by Mrs John Richmond 1922,0,1922.0,support: 394 x 419 mm,394,419,0.0,mm,0,0,http://www.tate.org.uk/art/images/work/A/A00/A...,http://www.tate.org.uk/art/artworks/blake-a-fi...
1,1036,A00002,"Blake, Robert",artist,38,"Two Drawings of Frightened Figures, Probably f...",date not known,Graphite on paper,Presented by Mrs John Richmond 1922,0,1922.0,support: 311 x 213 mm,311,213,0.0,mm,0,0,http://www.tate.org.uk/art/images/work/A/A00/A...,http://www.tate.org.uk/art/artworks/blake-two-...
2,1037,A00003,"Blake, Robert",artist,38,The Preaching of Warning. Verso: An Old Man En...,?c.1785,Graphite on paper. Verso: graphite on paper,Presented by Mrs John Richmond 1922,1785,1922.0,support: 343 x 467 mm,343,467,0.0,mm,0,0,http://www.tate.org.uk/art/images/work/A/A00/A...,http://www.tate.org.uk/art/artworks/blake-the-...
3,1038,A00004,"Blake, Robert",artist,38,Six Drawings of Figures with Outstretched Arms,date not known,Graphite on paper,Presented by Mrs John Richmond 1922,0,1922.0,support: 318 x 394 mm,318,394,0.0,mm,0,0,http://www.tate.org.uk/art/images/work/A/A00/A...,http://www.tate.org.uk/art/artworks/blake-six-...
4,1039,A00005,"Blake, William",artist,39,The Circle of the Lustful: Francesca da Rimini...,"1826–7, reprinted 1892",Line engraving on paper,Purchased with the assistance of a special gra...,1826,1919.0,image: 243 x 335 mm,243,335,0.0,mm,0,0,http://www.tate.org.uk/art/images/work/A/A00/A...,http://www.tate.org.uk/art/artworks/blake-the-...
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
69196,122960,T13865,"P-Orridge, Genesis",artist,16646,Larvae (from Tampax Romana),1975,"Perspex, Wood, hairpiece, tampon and human blood",Transferred from Tate Archive 2012,1975,2013.0,object: 305 x 305 x 135 mm,305,305,135.0,mm,0,0,0,http://www.tate.org.uk/art/artworks/p-orridge-...
69197,122961,T13866,"P-Orridge, Genesis",artist,16646,Living Womb (from Tampax Romana),1976,"Wood, Perspex, plastic, photograph on paper, t...",Transferred from Tate Archive 2012,1976,2013.0,object: 305 x 305 x 135 mm,305,305,135.0,mm,0,0,0,http://www.tate.org.uk/art/artworks/p-orridge-...
69198,121181,T13867,"Hatoum, Mona",artist,2365,Present Tense,1996,Soap and glass beads,Presented by Tate Members 2013,1996,2013.0,displayed: 45 x 2410 x 2990 mm,45,2410,2990.0,mm,0,0,0,http://www.tate.org.uk/art/artworks/hatoum-pre...
69199,112306,T13868,"Creed, Martin",artist,2760,Work No. 227: The lights going on and off,2000,Gallery lighting,"Purchased with funds provided by Tate Members,...",2000,2013.0,Overall display dimensions variable,0,0,0.0,0,0,0,0,http://www.tate.org.uk/art/artworks/creed-work...


In [27]:
# we want to fill only depth column's missing values with 0
data.fillna(value={'depth': 0})


Unnamed: 0,id,accession_number,artist,artistRole,artistId,title,dateText,medium,creditLine,year,acquisitionYear,dimensions,width,height,depth,units,inscription,thumbnailCopyright,thumbnailUrl,url
0,1035,A00001,"Blake, Robert",artist,38,A Figure Bowing before a Seated Old Man with h...,date not known,"Watercolour, ink, chalk and graphite on paper....",Presented by Mrs John Richmond 1922,,1922.0,support: 394 x 419 mm,394,419,0.0,mm,,,http://www.tate.org.uk/art/images/work/A/A00/A...,http://www.tate.org.uk/art/artworks/blake-a-fi...
1,1036,A00002,"Blake, Robert",artist,38,"Two Drawings of Frightened Figures, Probably f...",date not known,Graphite on paper,Presented by Mrs John Richmond 1922,,1922.0,support: 311 x 213 mm,311,213,0.0,mm,,,http://www.tate.org.uk/art/images/work/A/A00/A...,http://www.tate.org.uk/art/artworks/blake-two-...
2,1037,A00003,"Blake, Robert",artist,38,The Preaching of Warning. Verso: An Old Man En...,?c.1785,Graphite on paper. Verso: graphite on paper,Presented by Mrs John Richmond 1922,1785,1922.0,support: 343 x 467 mm,343,467,0.0,mm,,,http://www.tate.org.uk/art/images/work/A/A00/A...,http://www.tate.org.uk/art/artworks/blake-the-...
3,1038,A00004,"Blake, Robert",artist,38,Six Drawings of Figures with Outstretched Arms,date not known,Graphite on paper,Presented by Mrs John Richmond 1922,,1922.0,support: 318 x 394 mm,318,394,0.0,mm,,,http://www.tate.org.uk/art/images/work/A/A00/A...,http://www.tate.org.uk/art/artworks/blake-six-...
4,1039,A00005,"Blake, William",artist,39,The Circle of the Lustful: Francesca da Rimini...,"1826–7, reprinted 1892",Line engraving on paper,Purchased with the assistance of a special gra...,1826,1919.0,image: 243 x 335 mm,243,335,0.0,mm,,,http://www.tate.org.uk/art/images/work/A/A00/A...,http://www.tate.org.uk/art/artworks/blake-the-...
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
69196,122960,T13865,"P-Orridge, Genesis",artist,16646,Larvae (from Tampax Romana),1975,"Perspex, Wood, hairpiece, tampon and human blood",Transferred from Tate Archive 2012,1975,2013.0,object: 305 x 305 x 135 mm,305,305,135.0,mm,,,,http://www.tate.org.uk/art/artworks/p-orridge-...
69197,122961,T13866,"P-Orridge, Genesis",artist,16646,Living Womb (from Tampax Romana),1976,"Wood, Perspex, plastic, photograph on paper, t...",Transferred from Tate Archive 2012,1976,2013.0,object: 305 x 305 x 135 mm,305,305,135.0,mm,,,,http://www.tate.org.uk/art/artworks/p-orridge-...
69198,121181,T13867,"Hatoum, Mona",artist,2365,Present Tense,1996,Soap and glass beads,Presented by Tate Members 2013,1996,2013.0,displayed: 45 x 2410 x 2990 mm,45,2410,2990.0,mm,,,,http://www.tate.org.uk/art/artworks/hatoum-pre...
69199,112306,T13868,"Creed, Martin",artist,2760,Work No. 227: The lights going on and off,2000,Gallery lighting,"Purchased with funds provided by Tate Members,...",2000,2013.0,Overall display dimensions variable,,,0.0,,,,,http://www.tate.org.uk/art/artworks/creed-work...


In [28]:
data.depth.dtype


dtype('float64')

In [29]:
data.loc[data.depth> 0, ['title', 'depth']]


Unnamed: 0,title,depth
850,[title not known],112.0
1738,Pansies,4.0
1739,Tulips,3.0
1740,West Window,3.0
1741,Lillies Against Yellow House,3.0
...,...,...
69194,Venus Mound (from Tampax Romana),135.0
69195,It’s That Time Of The Month (from Tampax Romana),135.0
69196,Larvae (from Tampax Romana),135.0
69197,Living Womb (from Tampax Romana),135.0


In [30]:
data['depth'].head()


0   NaN
1   NaN
2   NaN
3   NaN
4   NaN
Name: depth, dtype: float64

In [31]:
data.fillna(value={'depth': 0}, inplace=True)


In [32]:
data['depth'].head()


0    0.0
1    0.0
2    0.0
3    0.0
4    0.0
Name: depth, dtype: float64

# Dropping rows of data


In [33]:
data = original_data.copy()
data.head()

Unnamed: 0,id,accession_number,artist,artistRole,artistId,title,dateText,medium,creditLine,year,acquisitionYear,dimensions,width,height,depth,units,inscription,thumbnailCopyright,thumbnailUrl,url
0,1035,A00001,"Blake, Robert",artist,38,A Figure Bowing before a Seated Old Man with h...,date not known,"Watercolour, ink, chalk and graphite on paper....",Presented by Mrs John Richmond 1922,,1922.0,support: 394 x 419 mm,394,419,,mm,,,http://www.tate.org.uk/art/images/work/A/A00/A...,http://www.tate.org.uk/art/artworks/blake-a-fi...
1,1036,A00002,"Blake, Robert",artist,38,"Two Drawings of Frightened Figures, Probably f...",date not known,Graphite on paper,Presented by Mrs John Richmond 1922,,1922.0,support: 311 x 213 mm,311,213,,mm,,,http://www.tate.org.uk/art/images/work/A/A00/A...,http://www.tate.org.uk/art/artworks/blake-two-...
2,1037,A00003,"Blake, Robert",artist,38,The Preaching of Warning. Verso: An Old Man En...,?c.1785,Graphite on paper. Verso: graphite on paper,Presented by Mrs John Richmond 1922,1785.0,1922.0,support: 343 x 467 mm,343,467,,mm,,,http://www.tate.org.uk/art/images/work/A/A00/A...,http://www.tate.org.uk/art/artworks/blake-the-...
3,1038,A00004,"Blake, Robert",artist,38,Six Drawings of Figures with Outstretched Arms,date not known,Graphite on paper,Presented by Mrs John Richmond 1922,,1922.0,support: 318 x 394 mm,318,394,,mm,,,http://www.tate.org.uk/art/images/work/A/A00/A...,http://www.tate.org.uk/art/artworks/blake-six-...
4,1039,A00005,"Blake, William",artist,39,The Circle of the Lustful: Francesca da Rimini...,"1826–7, reprinted 1892",Line engraving on paper,Purchased with the assistance of a special gra...,1826.0,1919.0,image: 243 x 335 mm,243,335,,mm,,,http://www.tate.org.uk/art/images/work/A/A00/A...,http://www.tate.org.uk/art/artworks/blake-the-...


In [34]:
data.shape


(69201, 20)

In [35]:
data.dropna()


Unnamed: 0,id,accession_number,artist,artistRole,artistId,title,dateText,medium,creditLine,year,acquisitionYear,dimensions,width,height,depth,units,inscription,thumbnailCopyright,thumbnailUrl,url
1738,98171,AR00001,"Katz, Alex",artist,1386,Pansies,1967,Oil paint on hardboard,ARTIST ROOMS Acquired jointly with the Nationa...,1967,2008.0,support: 309 x 404 x 4 mm frame: 322 x 424 x 3...,309,404,4.0,mm,date inscribed,© Alex Katz,http://www.tate.org.uk/art/images/work/AR/AR00...,http://www.tate.org.uk/art/artworks/katz-pansi...
1739,98172,AR00002,"Katz, Alex",artist,1386,Tulips,1969,Oil paint on hardboard,ARTIST ROOMS Acquired jointly with the Nationa...,1969,2008.0,support: 356 x 253 x 3 mm frame: 375 x 273 x 3...,356,253,3.0,mm,date inscribed,© Alex Katz,http://www.tate.org.uk/art/images/work/AR/AR00...,http://www.tate.org.uk/art/artworks/katz-tulip...
1740,98173,AR00003,"Katz, Alex",artist,1386,West Window,1979,Oil paint on hardboard,ARTIST ROOMS Acquired jointly with the Nationa...,1979,2008.0,support: 196 x 238 x 3 mm frame: 216 x 256 x 3...,196,238,3.0,mm,date inscribed,© Alex Katz,http://www.tate.org.uk/art/images/work/AR/AR00...,http://www.tate.org.uk/art/artworks/katz-west-...
1741,98174,AR00004,"Katz, Alex",artist,1386,Lillies Against Yellow House,1983,Oil paint on hardboard,ARTIST ROOMS Acquired jointly with the Nationa...,1983,2008.0,support: 307 x 229 x 3 mm frame: 327 x 249 x 3...,307,229,3.0,mm,date inscribed,© Alex Katz,http://www.tate.org.uk/art/images/work/AR/AR00...,http://www.tate.org.uk/art/artworks/katz-lilli...
1742,98175,AR00005,"Katz, Alex",artist,1386,Young Trees,1989,Oil paint on hardboard,ARTIST ROOMS Acquired jointly with the Nationa...,1989,2008.0,support: 407 x 301 x 3 mm frame: 426 x 320 x 3...,407,301,3.0,mm,date inscribed,© Alex Katz,http://www.tate.org.uk/art/images/work/AR/AR00...,http://www.tate.org.uk/art/artworks/katz-young...
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
61878,20155,T06453,"Buchheister, Carl",artist,2276,Composition Ursiem,1959,"Graphite, ink, oil paint and adhesive on paper...",Presented by Willi Kemp 1991,1959,1991.0,support: 302 x 389 x 23 mm,302,389,23.0,mm,date inscribed,"© DACS, 2014",http://www.tate.org.uk/art/images/work/T/T06/T...,http://www.tate.org.uk/art/artworks/buchheiste...
62083,20568,T06659,"Gober, Robert",artist,2309,Drains,1990,Pewter,Purchased 1992,1990,1992.0,object: 94 x 94 x 42 mm,94,94,42.0,mm,date inscribed,© Robert Gober,http://www.tate.org.uk/art/images/work/T/T06/T...,http://www.tate.org.uk/art/artworks/gober-drai...
62274,20984,T06858,"Paolozzi, Sir Eduardo",artist,1738,Cyclops,1957,Bronze,Purchased 1994,1957,1994.0,"object: 692 x 370 x 320 mm, 36.4 kg",692,370,320.0,mm,date inscribed,© The Eduardo Paolozzi Foundation,http://www.tate.org.uk/art/images/work/T/T06/T...,http://www.tate.org.uk/art/artworks/paolozzi-c...
62466,21517,T07052,"Irvin, Albert",artist,1342,St Germain,1995,Acrylic paint on canvas,Purchased 1996,1995,1996.0,support: 2140 x 3056 x 52 mm,2140,3056,52.0,mm,date inscribed,© Albert Irvin,http://www.tate.org.uk/art/images/work/T/T07/T...,http://www.tate.org.uk/art/artworks/irvin-st-g...


In [36]:
data.shape


(69201, 20)

In [37]:
# pandas.DataFrame.dropna
# DataFrame.dropna(*, axis=0, how=_NoDefault.no_default, thresh=_NoDefault.no_default, subset=None, inplace=False)
# how = 'any' by default.
data.dropna(how='all')


Unnamed: 0,id,accession_number,artist,artistRole,artistId,title,dateText,medium,creditLine,year,acquisitionYear,dimensions,width,height,depth,units,inscription,thumbnailCopyright,thumbnailUrl,url
0,1035,A00001,"Blake, Robert",artist,38,A Figure Bowing before a Seated Old Man with h...,date not known,"Watercolour, ink, chalk and graphite on paper....",Presented by Mrs John Richmond 1922,,1922.0,support: 394 x 419 mm,394,419,,mm,,,http://www.tate.org.uk/art/images/work/A/A00/A...,http://www.tate.org.uk/art/artworks/blake-a-fi...
1,1036,A00002,"Blake, Robert",artist,38,"Two Drawings of Frightened Figures, Probably f...",date not known,Graphite on paper,Presented by Mrs John Richmond 1922,,1922.0,support: 311 x 213 mm,311,213,,mm,,,http://www.tate.org.uk/art/images/work/A/A00/A...,http://www.tate.org.uk/art/artworks/blake-two-...
2,1037,A00003,"Blake, Robert",artist,38,The Preaching of Warning. Verso: An Old Man En...,?c.1785,Graphite on paper. Verso: graphite on paper,Presented by Mrs John Richmond 1922,1785,1922.0,support: 343 x 467 mm,343,467,,mm,,,http://www.tate.org.uk/art/images/work/A/A00/A...,http://www.tate.org.uk/art/artworks/blake-the-...
3,1038,A00004,"Blake, Robert",artist,38,Six Drawings of Figures with Outstretched Arms,date not known,Graphite on paper,Presented by Mrs John Richmond 1922,,1922.0,support: 318 x 394 mm,318,394,,mm,,,http://www.tate.org.uk/art/images/work/A/A00/A...,http://www.tate.org.uk/art/artworks/blake-six-...
4,1039,A00005,"Blake, William",artist,39,The Circle of the Lustful: Francesca da Rimini...,"1826–7, reprinted 1892",Line engraving on paper,Purchased with the assistance of a special gra...,1826,1919.0,image: 243 x 335 mm,243,335,,mm,,,http://www.tate.org.uk/art/images/work/A/A00/A...,http://www.tate.org.uk/art/artworks/blake-the-...
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
69196,122960,T13865,"P-Orridge, Genesis",artist,16646,Larvae (from Tampax Romana),1975,"Perspex, Wood, hairpiece, tampon and human blood",Transferred from Tate Archive 2012,1975,2013.0,object: 305 x 305 x 135 mm,305,305,135.0,mm,,,,http://www.tate.org.uk/art/artworks/p-orridge-...
69197,122961,T13866,"P-Orridge, Genesis",artist,16646,Living Womb (from Tampax Romana),1976,"Wood, Perspex, plastic, photograph on paper, t...",Transferred from Tate Archive 2012,1976,2013.0,object: 305 x 305 x 135 mm,305,305,135.0,mm,,,,http://www.tate.org.uk/art/artworks/p-orridge-...
69198,121181,T13867,"Hatoum, Mona",artist,2365,Present Tense,1996,Soap and glass beads,Presented by Tate Members 2013,1996,2013.0,displayed: 45 x 2410 x 2990 mm,45,2410,2990.0,mm,,,,http://www.tate.org.uk/art/artworks/hatoum-pre...
69199,112306,T13868,"Creed, Martin",artist,2760,Work No. 227: The lights going on and off,2000,Gallery lighting,"Purchased with funds provided by Tate Members,...",2000,2013.0,Overall display dimensions variable,,,,,,,,http://www.tate.org.uk/art/artworks/creed-work...


In [38]:
# drop a row if >= 15 columns have NaN values
data.dropna(thresh=15)


Unnamed: 0,id,accession_number,artist,artistRole,artistId,title,dateText,medium,creditLine,year,acquisitionYear,dimensions,width,height,depth,units,inscription,thumbnailCopyright,thumbnailUrl,url
0,1035,A00001,"Blake, Robert",artist,38,A Figure Bowing before a Seated Old Man with h...,date not known,"Watercolour, ink, chalk and graphite on paper....",Presented by Mrs John Richmond 1922,,1922.0,support: 394 x 419 mm,394,419,,mm,,,http://www.tate.org.uk/art/images/work/A/A00/A...,http://www.tate.org.uk/art/artworks/blake-a-fi...
1,1036,A00002,"Blake, Robert",artist,38,"Two Drawings of Frightened Figures, Probably f...",date not known,Graphite on paper,Presented by Mrs John Richmond 1922,,1922.0,support: 311 x 213 mm,311,213,,mm,,,http://www.tate.org.uk/art/images/work/A/A00/A...,http://www.tate.org.uk/art/artworks/blake-two-...
2,1037,A00003,"Blake, Robert",artist,38,The Preaching of Warning. Verso: An Old Man En...,?c.1785,Graphite on paper. Verso: graphite on paper,Presented by Mrs John Richmond 1922,1785,1922.0,support: 343 x 467 mm,343,467,,mm,,,http://www.tate.org.uk/art/images/work/A/A00/A...,http://www.tate.org.uk/art/artworks/blake-the-...
3,1038,A00004,"Blake, Robert",artist,38,Six Drawings of Figures with Outstretched Arms,date not known,Graphite on paper,Presented by Mrs John Richmond 1922,,1922.0,support: 318 x 394 mm,318,394,,mm,,,http://www.tate.org.uk/art/images/work/A/A00/A...,http://www.tate.org.uk/art/artworks/blake-six-...
4,1039,A00005,"Blake, William",artist,39,The Circle of the Lustful: Francesca da Rimini...,"1826–7, reprinted 1892",Line engraving on paper,Purchased with the assistance of a special gra...,1826,1919.0,image: 243 x 335 mm,243,335,,mm,,,http://www.tate.org.uk/art/images/work/A/A00/A...,http://www.tate.org.uk/art/artworks/blake-the-...
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
69195,122959,T13864,"P-Orridge, Genesis",artist,16646,It’s That Time Of The Month (from Tampax Romana),1975,"Wood, Perspex, clock case, tampons and human b...",Transferred from Tate Archive 2012,1975,2013.0,object: 305 x 305 x 135 mm,305,305,135.0,mm,,© Genesis P-Orridge,http://www.tate.org.uk/art/images/work/T/T13/T...,http://www.tate.org.uk/art/artworks/p-orridge-...
69196,122960,T13865,"P-Orridge, Genesis",artist,16646,Larvae (from Tampax Romana),1975,"Perspex, Wood, hairpiece, tampon and human blood",Transferred from Tate Archive 2012,1975,2013.0,object: 305 x 305 x 135 mm,305,305,135.0,mm,,,,http://www.tate.org.uk/art/artworks/p-orridge-...
69197,122961,T13866,"P-Orridge, Genesis",artist,16646,Living Womb (from Tampax Romana),1976,"Wood, Perspex, plastic, photograph on paper, t...",Transferred from Tate Archive 2012,1976,2013.0,object: 305 x 305 x 135 mm,305,305,135.0,mm,,,,http://www.tate.org.uk/art/artworks/p-orridge-...
69198,121181,T13867,"Hatoum, Mona",artist,2365,Present Tense,1996,Soap and glass beads,Presented by Tate Members 2013,1996,2013.0,displayed: 45 x 2410 x 2990 mm,45,2410,2990.0,mm,,,,http://www.tate.org.uk/art/artworks/hatoum-pre...


In [39]:
# drop based on columns

# drop rows, if year or acquisitionYear is NaN
data.dropna(subset=['year', 'acquisitionYear']).shape



(63781, 20)

In [40]:
# drop rows if year and acquisitionYear both are NaN
data.dropna(subset=['year', 'acquisitionYear'], how='all').shape


(69198, 20)

In [41]:
data.shape


(69201, 20)

In [42]:
data.dropna(subset=['year', 'acquisitionYear'], inplace=True)
data.shape


(63781, 20)

# Identifying and Dropping Duplicate Data


In [43]:
sample_data = pd.read_csv('data/artwork_sample.csv')
data = sample_data.copy()

In [44]:
data.head()

Unnamed: 0,id,accession_number,artist,artistRole,artistId,title,dateText,medium,creditLine,year,acquisitionYear,dimensions,width,height,depth,units,inscription,thumbnailCopyright,thumbnailUrl,url
0,1035,A00001,"Blake, Robert",artist,38,A Figure Bowing before a Seated Old Man with h...,date not known,"Watercolour, ink, chalk and graphite on paper....",Presented by Mrs John Richmond 1922,,1922,support: 394 x 419 mm,394,419,,mm,,,http://www.tate.org.uk/art/images/work/A/A00/A...,http://www.tate.org.uk/art/artworks/blake-a-fi...
1,1036,A00002,"Blake, Robert",artist,38,"Two Drawings of Frightened Figures, Probably f...",date not known,Graphite on paper,Presented by Mrs John Richmond 1922,,1922,support: 311 x 213 mm,311,213,,mm,,,http://www.tate.org.uk/art/images/work/A/A00/A...,http://www.tate.org.uk/art/artworks/blake-two-...
2,1037,A00003,"Blake, Robert",artist,38,The Preaching of Warning. Verso: An Old Man En...,?c.1785,Graphite on paper. Verso: graphite on paper,Presented by Mrs John Richmond 1922,1785.0,1922,support: 343 x 467 mm,343,467,,mm,,,http://www.tate.org.uk/art/images/work/A/A00/A...,http://www.tate.org.uk/art/artworks/blake-the-...
3,1038,A00004,"Blake, Robert",artist,38,Six Drawings of Figures with Outstretched Arms,date not known,Graphite on paper,Presented by Mrs John Richmond 1922,,1922,support: 318 x 394 mm,318,394,,mm,,,http://www.tate.org.uk/art/images/work/A/A00/A...,http://www.tate.org.uk/art/artworks/blake-six-...
4,1039,A00005,"Blake, William",artist,39,The Circle of the Lustful: Francesca da Rimini...,"1826–7, reprinted 1892",Line engraving on paper,Purchased with the assistance of a special gra...,1826.0,1919,image: 243 x 335 mm,243,335,,mm,,,http://www.tate.org.uk/art/images/work/A/A00/A...,http://www.tate.org.uk/art/artworks/blake-the-...


In [45]:
# by default, drop rows if the whole row is duplicate to another row
data.drop_duplicates()


Unnamed: 0,id,accession_number,artist,artistRole,artistId,title,dateText,medium,creditLine,year,acquisitionYear,dimensions,width,height,depth,units,inscription,thumbnailCopyright,thumbnailUrl,url
0,1035,A00001,"Blake, Robert",artist,38,A Figure Bowing before a Seated Old Man with h...,date not known,"Watercolour, ink, chalk and graphite on paper....",Presented by Mrs John Richmond 1922,,1922,support: 394 x 419 mm,394,419,,mm,,,http://www.tate.org.uk/art/images/work/A/A00/A...,http://www.tate.org.uk/art/artworks/blake-a-fi...
1,1036,A00002,"Blake, Robert",artist,38,"Two Drawings of Frightened Figures, Probably f...",date not known,Graphite on paper,Presented by Mrs John Richmond 1922,,1922,support: 311 x 213 mm,311,213,,mm,,,http://www.tate.org.uk/art/images/work/A/A00/A...,http://www.tate.org.uk/art/artworks/blake-two-...
2,1037,A00003,"Blake, Robert",artist,38,The Preaching of Warning. Verso: An Old Man En...,?c.1785,Graphite on paper. Verso: graphite on paper,Presented by Mrs John Richmond 1922,1785.0,1922,support: 343 x 467 mm,343,467,,mm,,,http://www.tate.org.uk/art/images/work/A/A00/A...,http://www.tate.org.uk/art/artworks/blake-the-...
3,1038,A00004,"Blake, Robert",artist,38,Six Drawings of Figures with Outstretched Arms,date not known,Graphite on paper,Presented by Mrs John Richmond 1922,,1922,support: 318 x 394 mm,318,394,,mm,,,http://www.tate.org.uk/art/images/work/A/A00/A...,http://www.tate.org.uk/art/artworks/blake-six-...
4,1039,A00005,"Blake, William",artist,39,The Circle of the Lustful: Francesca da Rimini...,"1826–7, reprinted 1892",Line engraving on paper,Purchased with the assistance of a special gra...,1826.0,1919,image: 243 x 335 mm,243,335,,mm,,,http://www.tate.org.uk/art/images/work/A/A00/A...,http://www.tate.org.uk/art/artworks/blake-the-...
5,1040,A00006,"Blake, William",artist,39,Ciampolo the Barrator Tormented by the Devils,"1826–7, reprinted 1892",Line engraving on paper,Purchased with the assistance of a special gra...,1826.0,1919,image: 240 x 338 mm,240,338,,mm,,,http://www.tate.org.uk/art/images/work/A/A00/A...,http://www.tate.org.uk/art/artworks/blake-ciam...
6,1041,A00007,"Blake, William",artist,39,The Baffled Devils Fighting,"1826–7, reprinted 1892",Line engraving on paper,Purchased with the assistance of a special gra...,1826.0,1919,image: 242 x 334 mm,242,334,,mm,,,http://www.tate.org.uk/art/images/work/A/A00/A...,http://www.tate.org.uk/art/artworks/blake-the-...
7,1042,A00008,"Blake, William",artist,39,The Six-Footed Serpent Attacking Agnolo Brunel...,"1826–7, reprinted 1892",Line engraving on paper,Purchased with the assistance of a special gra...,1826.0,1919,image: 246 x 340 mm,246,340,,mm,,,http://www.tate.org.uk/art/images/work/A/A00/A...,http://www.tate.org.uk/art/artworks/blake-the-...
8,1043,A00009,"Blake, William",artist,39,The Serpent Attacking Buoso Donati,"1826–7, reprinted 1892",Line engraving on paper,Purchased with the assistance of a special gra...,1826.0,1919,image: 241 x 335 mm,241,335,,mm,,,http://www.tate.org.uk/art/images/work/A/A00/A...,http://www.tate.org.uk/art/artworks/blake-the-...
9,1044,A00010,"Blake, William",artist,39,The Pit of Disease: The Falsifiers,"1826–7, reprinted 1892",Line engraving on paper,Purchased with the assistance of a special gra...,1826.0,1919,image: 243 x 340 mm,243,340,,mm,,,http://www.tate.org.uk/art/images/work/A/A00/A...,http://www.tate.org.uk/art/artworks/blake-the-...


In [46]:
# deleting duplicates based on a subset

# this will delete the rows where artist value is duplicated within other rows.
data.drop_duplicates(subset=['artist'])

Unnamed: 0,id,accession_number,artist,artistRole,artistId,title,dateText,medium,creditLine,year,acquisitionYear,dimensions,width,height,depth,units,inscription,thumbnailCopyright,thumbnailUrl,url
0,1035,A00001,"Blake, Robert",artist,38,A Figure Bowing before a Seated Old Man with h...,date not known,"Watercolour, ink, chalk and graphite on paper....",Presented by Mrs John Richmond 1922,,1922,support: 394 x 419 mm,394,419,,mm,,,http://www.tate.org.uk/art/images/work/A/A00/A...,http://www.tate.org.uk/art/artworks/blake-a-fi...
4,1039,A00005,"Blake, William",artist,39,The Circle of the Lustful: Francesca da Rimini...,"1826–7, reprinted 1892",Line engraving on paper,Purchased with the assistance of a special gra...,1826.0,1919,image: 243 x 335 mm,243,335,,mm,,,http://www.tate.org.uk/art/images/work/A/A00/A...,http://www.tate.org.uk/art/artworks/blake-the-...


In [47]:
data.drop_duplicates(subset=['artist'], keep='first')


Unnamed: 0,id,accession_number,artist,artistRole,artistId,title,dateText,medium,creditLine,year,acquisitionYear,dimensions,width,height,depth,units,inscription,thumbnailCopyright,thumbnailUrl,url
0,1035,A00001,"Blake, Robert",artist,38,A Figure Bowing before a Seated Old Man with h...,date not known,"Watercolour, ink, chalk and graphite on paper....",Presented by Mrs John Richmond 1922,,1922,support: 394 x 419 mm,394,419,,mm,,,http://www.tate.org.uk/art/images/work/A/A00/A...,http://www.tate.org.uk/art/artworks/blake-a-fi...
4,1039,A00005,"Blake, William",artist,39,The Circle of the Lustful: Francesca da Rimini...,"1826–7, reprinted 1892",Line engraving on paper,Purchased with the assistance of a special gra...,1826.0,1919,image: 243 x 335 mm,243,335,,mm,,,http://www.tate.org.uk/art/images/work/A/A00/A...,http://www.tate.org.uk/art/artworks/blake-the-...


In [48]:
data.drop_duplicates(subset=['artist'], keep='last')


Unnamed: 0,id,accession_number,artist,artistRole,artistId,title,dateText,medium,creditLine,year,acquisitionYear,dimensions,width,height,depth,units,inscription,thumbnailCopyright,thumbnailUrl,url
3,1038,A00004,"Blake, Robert",artist,38,Six Drawings of Figures with Outstretched Arms,date not known,Graphite on paper,Presented by Mrs John Richmond 1922,,1922,support: 318 x 394 mm,318,394,,mm,,,http://www.tate.org.uk/art/images/work/A/A00/A...,http://www.tate.org.uk/art/artworks/blake-six-...
9,1044,A00010,"Blake, William",artist,39,The Pit of Disease: The Falsifiers,"1826–7, reprinted 1892",Line engraving on paper,Purchased with the assistance of a special gra...,1826.0,1919,image: 243 x 340 mm,243,340,,mm,,,http://www.tate.org.uk/art/images/work/A/A00/A...,http://www.tate.org.uk/art/artworks/blake-the-...


In [49]:
data.drop_duplicates(subset=['artist'], keep=False)


Unnamed: 0,id,accession_number,artist,artistRole,artistId,title,dateText,medium,creditLine,year,acquisitionYear,dimensions,width,height,depth,units,inscription,thumbnailCopyright,thumbnailUrl,url


In [50]:
data


Unnamed: 0,id,accession_number,artist,artistRole,artistId,title,dateText,medium,creditLine,year,acquisitionYear,dimensions,width,height,depth,units,inscription,thumbnailCopyright,thumbnailUrl,url
0,1035,A00001,"Blake, Robert",artist,38,A Figure Bowing before a Seated Old Man with h...,date not known,"Watercolour, ink, chalk and graphite on paper....",Presented by Mrs John Richmond 1922,,1922,support: 394 x 419 mm,394,419,,mm,,,http://www.tate.org.uk/art/images/work/A/A00/A...,http://www.tate.org.uk/art/artworks/blake-a-fi...
1,1036,A00002,"Blake, Robert",artist,38,"Two Drawings of Frightened Figures, Probably f...",date not known,Graphite on paper,Presented by Mrs John Richmond 1922,,1922,support: 311 x 213 mm,311,213,,mm,,,http://www.tate.org.uk/art/images/work/A/A00/A...,http://www.tate.org.uk/art/artworks/blake-two-...
2,1037,A00003,"Blake, Robert",artist,38,The Preaching of Warning. Verso: An Old Man En...,?c.1785,Graphite on paper. Verso: graphite on paper,Presented by Mrs John Richmond 1922,1785.0,1922,support: 343 x 467 mm,343,467,,mm,,,http://www.tate.org.uk/art/images/work/A/A00/A...,http://www.tate.org.uk/art/artworks/blake-the-...
3,1038,A00004,"Blake, Robert",artist,38,Six Drawings of Figures with Outstretched Arms,date not known,Graphite on paper,Presented by Mrs John Richmond 1922,,1922,support: 318 x 394 mm,318,394,,mm,,,http://www.tate.org.uk/art/images/work/A/A00/A...,http://www.tate.org.uk/art/artworks/blake-six-...
4,1039,A00005,"Blake, William",artist,39,The Circle of the Lustful: Francesca da Rimini...,"1826–7, reprinted 1892",Line engraving on paper,Purchased with the assistance of a special gra...,1826.0,1919,image: 243 x 335 mm,243,335,,mm,,,http://www.tate.org.uk/art/images/work/A/A00/A...,http://www.tate.org.uk/art/artworks/blake-the-...
5,1040,A00006,"Blake, William",artist,39,Ciampolo the Barrator Tormented by the Devils,"1826–7, reprinted 1892",Line engraving on paper,Purchased with the assistance of a special gra...,1826.0,1919,image: 240 x 338 mm,240,338,,mm,,,http://www.tate.org.uk/art/images/work/A/A00/A...,http://www.tate.org.uk/art/artworks/blake-ciam...
6,1041,A00007,"Blake, William",artist,39,The Baffled Devils Fighting,"1826–7, reprinted 1892",Line engraving on paper,Purchased with the assistance of a special gra...,1826.0,1919,image: 242 x 334 mm,242,334,,mm,,,http://www.tate.org.uk/art/images/work/A/A00/A...,http://www.tate.org.uk/art/artworks/blake-the-...
7,1042,A00008,"Blake, William",artist,39,The Six-Footed Serpent Attacking Agnolo Brunel...,"1826–7, reprinted 1892",Line engraving on paper,Purchased with the assistance of a special gra...,1826.0,1919,image: 246 x 340 mm,246,340,,mm,,,http://www.tate.org.uk/art/images/work/A/A00/A...,http://www.tate.org.uk/art/artworks/blake-the-...
8,1043,A00009,"Blake, William",artist,39,The Serpent Attacking Buoso Donati,"1826–7, reprinted 1892",Line engraving on paper,Purchased with the assistance of a special gra...,1826.0,1919,image: 241 x 335 mm,241,335,,mm,,,http://www.tate.org.uk/art/images/work/A/A00/A...,http://www.tate.org.uk/art/artworks/blake-the-...
9,1044,A00010,"Blake, William",artist,39,The Pit of Disease: The Falsifiers,"1826–7, reprinted 1892",Line engraving on paper,Purchased with the assistance of a special gra...,1826.0,1919,image: 243 x 340 mm,243,340,,mm,,,http://www.tate.org.uk/art/images/work/A/A00/A...,http://www.tate.org.uk/art/artworks/blake-the-...


In [51]:
data.drop_duplicates(subset=['artist'], keep='first', inplace=True)


In [52]:
data


Unnamed: 0,id,accession_number,artist,artistRole,artistId,title,dateText,medium,creditLine,year,acquisitionYear,dimensions,width,height,depth,units,inscription,thumbnailCopyright,thumbnailUrl,url
0,1035,A00001,"Blake, Robert",artist,38,A Figure Bowing before a Seated Old Man with h...,date not known,"Watercolour, ink, chalk and graphite on paper....",Presented by Mrs John Richmond 1922,,1922,support: 394 x 419 mm,394,419,,mm,,,http://www.tate.org.uk/art/images/work/A/A00/A...,http://www.tate.org.uk/art/artworks/blake-a-fi...
4,1039,A00005,"Blake, William",artist,39,The Circle of the Lustful: Francesca da Rimini...,"1826–7, reprinted 1892",Line engraving on paper,Purchased with the assistance of a special gra...,1826.0,1919,image: 243 x 335 mm,243,335,,mm,,,http://www.tate.org.uk/art/images/work/A/A00/A...,http://www.tate.org.uk/art/artworks/blake-the-...


In [53]:
# start with full dataset
data = original_data.copy()


In [54]:
data.shape


(69201, 20)

In [55]:
data.duplicated()


0        False
1        False
2        False
3        False
4        False
         ...  
69196    False
69197    False
69198    False
69199    False
69200    False
Length: 69201, dtype: bool

In [56]:
data.loc[data.duplicated()]


Unnamed: 0,id,accession_number,artist,artistRole,artistId,title,dateText,medium,creditLine,year,acquisitionYear,dimensions,width,height,depth,units,inscription,thumbnailCopyright,thumbnailUrl,url


In [57]:
data.duplicated(subset=['artist', 'title'], keep=False)


0        False
1        False
2        False
3        False
4         True
         ...  
69196    False
69197    False
69198    False
69199    False
69200    False
Length: 69201, dtype: bool

In [58]:
data.loc[data.duplicated(subset=['artist', 'title'], keep=False), ['artist', 'title']]


Unnamed: 0,artist,title
4,"Blake, William",The Circle of the Lustful: Francesca da Rimini...
5,"Blake, William",Ciampolo the Barrator Tormented by the Devils
6,"Blake, William",The Baffled Devils Fighting
7,"Blake, William",The Six-Footed Serpent Attacking Agnolo Brunel...
8,"Blake, William",The Serpent Attacking Buoso Donati
...,...,...
69169,"Barlow, Phyllida",Untitled
69170,"Barlow, Phyllida",Untitled
69171,"Barlow, Phyllida",Untitled
69172,"Barlow, Phyllida",Untitled


In [71]:
data.loc[data.title.str.contains('akaba',regex=True, na=False, case=False), :]


Unnamed: 0,id,accession_number,artist,artistRole,artistId,title,dateText,medium,creditLine,year,acquisitionYear,dimensions,width,height,depth,units,inscription,thumbnailCopyright,thumbnailUrl,url
44345,1423,N06252,"Brabazon, Hercules Brabazon",artist,50,Akaba,date not known,Watercolour and gouache on paper,Bequeathed by Sir Victor A.A.H. Wellesley Bt 1954,,1954.0,support: 254 x 330 mm,254.0,330.0,,mm,,,http://www.tate.org.uk/art/images/work/N/N06/N...,http://www.tate.org.uk/art/artworks/brabazon-a...
55574,120563,P80252,"Ojeikere, J.D. Okhai",artist,16018,Untitled (Onile Gogoro Or Akaba),1975,"Photograph, gelatin silver print on paper",Purchased with funds provided by the Acquisiti...,1975.0,2013.0,unconfirmed,,,,,,,,http://www.tate.org.uk/art/artworks/ojeikere-u...
