# Defining a schema

- Creating a defined schema helps with data quality and import performance. As mentioned during the lesson, we'll create a simple schema to read in the following columns:
> - Name
> - Age
> - City

- The `Name` and `City` columns are `StringType()` and the `Age` column is an `IntegerType()`.

## Instructions

- Import * from the `pyspark.sql.types` library.
- Define a new schema using the `StructType` method.
- Define a `StructField` for `name`, `age`, and `city`. Each field should correspond to the correct datatype and not be nullable.

In [2]:
# Intialization
import os
import sys

os.environ["SPARK_HOME"] = "/home/talentum/spark"
os.environ["PYLIB"] = os.environ["SPARK_HOME"] + "/python/lib"
# In below two lines, use /usr/bin/python2.7 if you want to use Python 2
os.environ["PYSPARK_PYTHON"] = "/usr/bin/python3.6" 
os.environ["PYSPARK_DRIVER_PYTHON"] = "/usr/bin/python3"
sys.path.insert(0, os.environ["PYLIB"] +"/py4j-0.10.7-src.zip")
sys.path.insert(0, os.environ["PYLIB"] +"/pyspark.zip")

# NOTE: Whichever package you want mention here.
# os.environ['PYSPARK_SUBMIT_ARGS'] = '--packages com.databricks:spark-xml_2.11:0.6.0 pyspark-shell' 
# os.environ['PYSPARK_SUBMIT_ARGS'] = '--packages org.apache.spark:spark-avro_2.11:2.4.0 pyspark-shell'
os.environ['PYSPARK_SUBMIT_ARGS'] = '--packages com.databricks:spark-xml_2.11:0.6.0,org.apache.spark:spark-avro_2.11:2.4.3 pyspark-shell'
# os.environ['PYSPARK_SUBMIT_ARGS'] = '--packages com.databricks:spark-xml_2.11:0.6.0,org.apache.spark:spark-avro_2.11:2.4.0 pyspark-shell'

In [3]:
#Entrypoint 2.x
from pyspark.sql import SparkSession
spark = SparkSession.builder.appName("Spark SQL basic example").enableHiveSupport().getOrCreate()

# On yarn:
# spark = SparkSession.builder.appName("Spark SQL basic example").enableHiveSupport().master("yarn").getOrCreate()
# specify .master("yarn")

sc = spark.sparkContext

In [12]:
# Import the pyspark.sql.types library
from pyspark.sql.types import StructType, StructField, StringType, IntegerType

# Define a new schema using the StructType method
people_schema = StructType([
  # Define a StructField for each field
  StructField('name', StringType(), False),
  StructField('age', IntegerType(), False),
    StructField('city', StringType(), False)
])

## Implementation of schema

In [13]:
## loading data using inferschema
file_path = 'file:////home/talentum/test-jupyter/P3/M1/SM1/people.csv'
df1 = spark.read.csv(file_path,header=True, inferSchema=True)
df1.printSchema()

root
 |-- _c0: integer (nullable = true)
 |-- person_id: integer (nullable = true)
 |-- name: string (nullable = true)
 |-- sex: string (nullable = true)
 |-- date of birth: string (nullable = true)



In [17]:
# save subset as csv file
df2 = df1.select('person_id','name')
df2.write.csv('file:////home/talentum/test-jupyter/P3/M1/SM1/test.csv')

In [16]:
cat ~/test-jupyter/P3/M1/SM1/test.csv/part-00000-4248e90a-578b-4a3b-8145-c32e36c675da-c000.csv

100,Penelope Lewis
101,David Anthony
102,Ida Shipp
103,Joanna Moore
104,Lisandra Ortiz
105,David Simmons
106,Edward Hudson
107,Albert Jones
108,Leonard Cavender
109,Everett Vadala
110,Freddie Claridge
111,Annabelle Rosseau
112,Eulah Emanuel
113,Shaun Love
114,Alejandro Brennan
115,Robert Mcreynolds
116,Carla Spickard
117,Florence Eberhart
118,Tina Gaskins
119,Florence Mulhern
120,Joel Smith
121,Evelyn Kriner
122,Heather Luce
123,Angel Moher
124,Charles Leonard
125,Mark Miller
126,Marion Baca
127,Devona Kay
128,Betty Endicott
129,David Bishop
130,Jane Ross
131,Joseph Windus
132,Christopher Gilbert
133,Robert Salisbury
134,Pauline Steele
135,Anne Novotny
136,Wilbert Glass
137,Carol Noble
138,Constance Fulmer
139,Renee Simon
140,Juan Dunn
141,Kristie Price
142,Thomas Nichols
143,Kimberly Harms
144,Lyle Murray
145,Michele Stephens
146,Lee Finney
147,Michael Coffin
148,Martha Jordan
149,Corinne Hansen
150,Drew Rowe
151,Margaret Belk
152,Mi

4549,Donald Whitlow
4550,Ruth Washinton
4551,Nadine Hicks
4552,Curtis Chicas
4553,Paula Garcia
4554,Michael Burlew
4555,Christina Lindsay
4556,An Howell
4557,Robin Smith
4558,Israel Downing
4559,Todd Garcia
4560,Renee Lail
4561,Raeann Baker
4562,Troy Laverty
4563,Christina Doerr
4564,Glenn Martinez
4565,Jeremy Weldon
4566,Eddie Beckett
4567,Jason Nunez
4568,Myrtle Hurt
4569,James Rodriguez
4570,Melanie Hilyard
4571,Gary Philpot
4572,Letha Yawn
4573,Norberto Berman
4574,Michael Riggleman
4575,Heidi Hughes
4576,Thomas Moore
4577,Brian Mauricio
4578,Joyce Barrett
4579,Gene Jennings
4580,Danielle Pellegrino
4581,Margaret Jones
4582,Crystal Robin
4583,Zoe Pauly
4584,Kristy Guerrero
4585,Tatiana Travis
4586,Kathy Williams
4587,Orlando Blow
4588,John Murrow
4589,Corey Schwein
4590,William Parlin
4591,Mary Richardson
4592,Andrew Bentz
4593,Jimmy Watson
4594,Gloria Givens
4595,Walter Prather
4596,James Molina
4597,Carlos Mcduffie
4598,Helen Devli

10079,Leon Hogan
10080,Freda Lamb
10081,Sylvester Locker
10082,Harriet Parker
10083,Susan Durham
10084,Sylvia Norwood
10085,Kathy Margerum
10086,Larry Aguilar
10087,Linda Mahr
10088,Richard Hunt
10089,Maria Noble
10090,George Duarte
10091,Paulina Smith
10092,Jeffrey Cable
10093,Felicia Hinostroza
10094,Ashley Grace
10095,John Maxfield
10096,Joseph Elkins
10097,Elvira Smith
10098,Julie Craft
10099,Cynthia Telesco
10100,Matthew Holsey
10101,Natalie Wilcox
10102,Kelly Hall
10103,Cindy Hickok
10104,Tracy Loper
10105,Lora Bramlett
10106,Regina Robison
10107,Nora Phillips
10108,Jeffrey Beall
10109,Rosemarie Johnson
10110,Summer Burrell
10111,Eric Darling
10112,Enrique Baker
10113,Robert Phelps
10114,Helen Healey
10115,Carol Estrada
10116,Stephanie Sitar
10117,Aaron Nielsen
10118,Jasmine Smith
10119,Martin Dowd
10120,Scott Rizzi
10121,Patricia Brown
10122,Kenneth Morton
10123,Jason Baker
10124,Ada Lecky
10125,Larry Landaker
10126,William Leach
1

15540,Jeremy Carter
15541,Ann Lambert
15542,Jorge Robinson
15543,Emily Haag
15544,Kyle Mcmath
15545,George Swain
15546,Jerome Hoover
15547,Sheila Fred
15548,Nelda Holcomb
15549,Adria Goss
15550,Margaret Newman
15551,Theodore Luna
15552,Betty Mcguigan
15553,Jessica Bellinger
15554,Marion Lesinski
15555,Roger Arnold
15556,Lawrence Hamrick
15557,Virginia Angeles
15558,Mary Rosamond
15559,Christopher Pack
15560,Michael Dewitt
15561,Luis Overcash
15562,Jeffrey Rice
15563,Tina Cunningham
15564,Lydia Wright
15565,Melissa Johnson
15566,Yajaira Danielsen
15567,Margaret Delacruz
15568,Louise Ritacco
15569,Jean Gist
15570,Jorge Kirk
15571,Allan Hudson
15572,Sara Shea
15573,Benjamin Farmer
15574,Darryl Williams
15575,Bonnie King
15576,Ernie Westley
15577,Traci Walker
15578,Martha Charpentier
15579,Frances Delvalle
15580,Deborah Tackitt
15581,Roscoe Rowe
15582,Gretchen Wells
15583,Nicole Anglea
15584,Robert Menzies
15585,Van Halterman
15586,Marcy Smith

20268,Darlene Pruett
20269,Alice Woodall
20270,Marjorie Miller
20271,Justina Baker
20272,Ann Sowinski
20273,Tiffani Sabin
20274,Eve Schmidt
20275,Lisa Stone
20276,Tammi Newhall
20277,Susan Lowery
20278,Timothy Mccollough
20279,Charles Reaves
20280,Robert Johnson
20281,Melissa Smith
20282,Tara Scarborough
20283,Kimberly Clynes
20284,Paul Jones
20285,Daniel Baca
20286,Lisa Danz
20287,Sandy Villiard
20288,Kelley Baum
20289,Lillian Jones
20290,Thomas Begley
20291,David Ferrante
20292,Gary Steiger
20293,Nathan Woodford
20294,Anita Baker
20295,Jason Valdez
20296,Carolyn Delia
20297,Nancy Walker
20298,Stanley Hendrick
20299,Romaine Herlocker
20300,Charles Davis
20301,Ladonna Carter
20302,Debra Stiger
20303,Jackie Fonseca
20304,Louis Harris
20305,Lisa Villanvera
20306,James Morton
20307,Latisha Crawford
20308,Isabel Smith
20309,Katie Neal
20310,Allison Meade
20311,Janet Kelly
20312,Daniel Toney
20313,Mamie Bonavia
20314,Russell Allen
20315,Latisha

25815,Betty Pemberton
25816,Richard Young
25817,Peter Smith
25818,Gretchen Pietrowski
25819,Terrence Olmsted
25820,Antoinette Thompson
25821,Levi Jackson
25822,Myrtis Leary
25823,Nicholas Fields
25824,Donald Leto
25825,Robert Irwin
25826,Edward Quinones
25827,Terri Harris
25828,Jordan Shipp
25829,Prince Chapman
25830,Michael Simmons
25831,Joanne Maurey
25832,Patricia Philbrick
25833,Joy Buhmann
25834,Jerome Bellamy
25835,Anna Long
25836,Patricia Conover
25837,Michael Wong
25838,Osvaldo Lefever
25839,Jeremy Scott
25840,Arnold Stover
25841,Jackie Rodriguez
25842,Natalie Yun
25843,Thomas Okelley
25844,Marlys Rodrigues
25845,Lydia Dallaire
25846,Anna Hernandez
25847,Jennifer Juarez
25848,Lillian Gill
25849,Hal Reese
25850,Johnathan Vilardi
25851,Mary Hayes
25852,Herbert Nelson
25853,Carla Smith
25854,Betty Ontiveros
25855,Victoria Cox
25856,James Brooks
25857,Dale Kimura
25858,Matthew Barr
25859,Geoffrey Hollingsworth
25860,Silvia Dickey
25861,

30444,Katie Pontious
30445,Robert Wilson
30446,Amanda Quintero
30447,Adrian Moore
30448,Gerald Betschart
30449,William Mohammed
30450,Ralph Hernandez
30451,Edward Morales
30452,Jewell Krout
30453,David Cross
30454,David Westmark
30455,John Collins
30456,Lynn Petry
30457,Rodolfo Ford
30458,James Sanders
30459,Daniel Rodrigue
30460,Lillie Armstrong
30461,James Jenkins
30462,Ramona Zinke
30463,Carole Fletcher
30464,Victor Hand
30465,Patricia Clyde
30466,Ann Smith
30467,Chad Felton
30468,Nicole Colton
30469,Amanda Carder
30470,Marcus Hubbard
30471,Wallace Brown
30472,Roland Mckee
30473,Anthony Knapp
30474,Rosemary Rodrigues
30475,Helen Boyer
30476,Gregory Barbosa
30477,Alan Forrester
30478,Jack Moniz
30479,Clint Wilson
30480,Benjamin Ramirez
30481,Joseph Zbinden
30482,Nicole Smith
30483,Sandra Garrison
30484,Parker Lott
30485,Tammera Frisby
30486,Michael Cohn
30487,Bradley Loggains
30488,Velma Apadoca
30489,Frances Hicks
30490,Janet Shaw
30491

35344,Edward Curtis
35345,William Firkins
35346,Misty Gladding
35347,Francesca Brock
35348,Mildred Walker
35349,Derrick Couley
35350,Gene Flores
35351,Gloria Robertson
35352,Barbara Vargas
35353,Susan Brown
35354,John Harris
35355,Nancy Miller
35356,Noel Smith
35357,Nathan Davis
35358,Julie Welsh
35359,Ida Newton
35360,Michael Zink
35361,Jessica Voit
35362,Michael Mckinney
35363,Manuel Hays
35364,Darrell Bayer
35365,Stanley Obryan
35366,Angela Jung
35367,Brian Dwyer
35368,Charles Womack
35369,Terri Blish
35370,Annette Thomas
35371,Nathan Richardson
35372,Mary Ballas
35373,Tina Berry
35374,Venus Combass
35375,Mark Kaur
35376,Elizabeth Taylor
35377,James Worrell
35378,Daphne Isenberg
35379,William Keister
35380,Jennifer Kiedrowski
35381,Nicole Whilden
35382,Vernon Kempton
35383,Laura Bowmer
35384,Kevin Jones
35385,Sylvia Kaspar
35386,Alejandro Harrington
35387,Marshall Williams
35388,Maureen Kennedy
35389,Ronald Pollard
35390,Vida Jacquez
35

40713,Silas Ponder
40714,David Marinelli
40715,Frank Miller
40716,Duane Becker
40717,Gwen North
40718,Steve Stringer
40719,Phyllis James
40720,Mary Seamon
40721,Dianne Keplin
40722,Adam Williams
40723,Gregory Summers
40724,Caryl Bradley
40725,Juan Graham
40726,Michael Mccune
40727,Racheal Sundermeyer
40728,Chad Peters
40729,Brooks Barlow
40730,John Wilson
40731,Anthony Joseph
40732,Jasmine Price
40733,Ann Brennan
40734,Lisa Jones
40735,Glenda Mccoy
40736,Carolyn Edrington
40737,Linda Covington
40738,Ericka Gray
40739,Ricardo Wells
40740,Bill Williams
40741,Carl Southerly
40742,Bertha Howard
40743,Jacob Minutillo
40744,William Johnson
40745,Anthony Lopez
40746,Harvey Abnet
40747,Bradley Ventura
40748,Joshua Cooper
40749,Wilton Aguilar
40750,Jeremiah Welch
40751,Susan Cabrera
40752,Edith Baum
40753,Richard Owens
40754,Janice Dunson
40755,William Snead
40756,Isaiah Calderon
40757,Harold Chance
40758,Lisa Richardson
40759,Virginia Vieira
40760

46079,Robert Goodall
46080,Sonia Miller
46081,Matthew Anderson
46082,Katherine Cobb
46083,Taryn Schilling
46084,Ricky Jacobs
46085,James Vanzant
46086,Carolyn Lyons
46087,Ann Keck
46088,Roberta Norrick
46089,Kasey Kinkel
46090,Duane Clayton
46091,Maria Younger
46092,James Stewart
46093,Warren Mackay
46094,Christine Perez
46095,Delmar Connaughton
46096,Joey Conyers
46097,Earl Cobb
46098,Robbie Dilly
46099,Kelly Beebe
46100,Mike Fulcher
46101,Courtney Montiel
46102,Mary Guzman
46103,Edythe Tuck
46104,Nelson Hill
46105,Patrick Graves
46106,William Bean
46107,Joyce Morgan
46108,Tammy Ewing
46109,Esther Drummond
46110,Michael Williams
46111,Ada Wendel
46112,Wilfredo Hok
46113,Tyrone Bailey
46114,Carolyn Miranda
46115,Thomas Ng
46116,Robert Musgrove
46117,Edna Morris
46118,Samuel Guerra
46119,Gustavo Hall
46120,Sandra Tillman
46121,Matthew Welch
46122,Lonnie Brim
46123,Margaret Levin
46124,Harold Barnes
46125,Rebecca Villarreal
46126,Mary Johnso

51179,Richard Ambrose
51180,Randy Jackson
51181,Krista Webster
51182,Lorna Everett
51183,Maria Amick
51184,Erik Blanco
51185,Michael Chalmers
51186,Mary Leth
51187,Kelly Snider
51188,Steven Raglin
51189,Sandy Griffin
51190,Matthew Young
51191,Jack Turley
51192,Emilio Fields
51193,John Pinchback
51194,Carl Schmelz
51195,Selena Bucklin
51196,Kristi Higginbotham
51197,Lena Torres
51198,Nancy Barkan
51199,Irene Thibodeau
51200,Edward Roan
51201,Rose Davis
51202,Judy Lowe
51203,Kathleen Curtis
51204,Lina Musgrove
51205,Tammy Schoonover
51206,Sherry Johnson
51207,Dolores Williams
51208,Allison Reilly
51209,Zachary Lopez
51210,Jimmy Carmody
51211,Maria Johnson
51212,Robin Burns
51213,Alan Davis
51214,Donald Newman
51215,Florine Carlson
51216,Kristen Holliday
51217,Samuel Ledet
51218,Ellen Boyce
51219,Daniel Harris
51220,Carroll Marion
51221,Patrick Banks
51222,Christine Matheney
51223,Dawn Melnick
51224,Timothy Kang
51225,Kenneth Wells
51226,Albe

56167,Joey Gross
56168,James Jackson
56169,David Gutierrez
56170,Arlene Coe
56171,Isreal Means
56172,Shirley Green
56173,Elaine Rivard
56174,Linda Wilkins
56175,William Massey
56176,David Vazquez
56177,Patricia Cardwell
56178,Cheryl Chandler
56179,Rose Konrad
56180,Belle Scruggs
56181,Susan Shackford
56182,Tina Sauter
56183,Walter Mullinax
56184,Amy Marquez
56185,Raymond Fields
56186,Mark Hendricks
56187,Thomas Allen
56188,Sarah Ferreira
56189,Guillermo Shah
56190,Tara Berlin
56191,Steven Smith
56192,Antonio Freeman
56193,Hector Fallon
56194,Marion Zelkin
56195,Kathleen Kizer
56196,Richard Luer
56197,Jeannette Davis
56198,Christopher Jarrett
56199,Cara James
56200,Manuel Ungar
56201,Jason Frija
56202,Ella Pickett
56203,Betty Kissler
56204,Joan Dempsey
56205,Michael Pamintuan
56206,Willie Patton
56207,Barbara Rogers
56208,Marcos Thompson
56209,John Levitt
56210,Donald Lampl
56211,Ella Staley
56212,Jeffrey Cashdollar
56213,Crystal Davis
5621

60518,Gisela Straw
60519,Tracy Willis
60520,August Diaz
60521,Alberto Maloch
60522,Bonny Richardson
60523,Ramon Shaffer
60524,Jessica Welle
60525,Terrance Skelly
60526,James Link
60527,Sandra Labianca
60528,James Adger
60529,Christopher Letchworth
60530,Sammy Marquez
60531,Christie Hall
60532,Ann Haggerty
60533,Cheryl Sherman
60534,Tina Sollars
60535,Louis Foster
60536,Dave Redd
60537,Robert Bower
60538,Nancy Tesh
60539,Marlyn Nieves
60540,Amanda Thompson
60541,Elizabeth Berryhill
60542,Timothy Carpenter
60543,Homer Robinson
60544,Michael Prater
60545,Roger Larson
60546,Stephen Mcphearson
60547,Sherri Aponte
60548,Judy Chancey
60549,Tom Follick
60550,Kristy Delreal
60551,Ruth Nichols
60552,Alene Grantham
60553,Frances Meade
60554,Jerry Gaskins
60555,Wayne Hawk
60556,Scott Hyatt
60557,Kevin Whitehead
60558,Sharon Morales
60559,Ashley Peevey
60560,William Tavana
60561,Jeff Martin
60562,Aaron Major
60563,Brent Mahajan
60564,George Banda
60565

65064,Jeffrey Holmes
65065,Jack Conway
65066,Holly Provencher
65067,Lorie Levings
65068,Diane Rennels
65069,Shirely Tretheway
65070,Janice Bruton
65071,Margaret Graves
65072,Jewell Cotto
65073,Glenn Hernandez
65074,Margaret Pena
65075,Janette Spencer
65076,Anthony Schneider
65077,Berta Mitchell
65078,William Miller
65079,Angelita Pucci
65080,Letha Cain
65081,Lisa Scott
65082,Edith Myers
65083,Patricia Yanez
65084,Michael Culotta
65085,John Bayer
65086,Damon Samuels
65087,John Plowman
65088,Roberta Villasenor
65089,Jesus Wright
65090,Anthony Calais
65091,Dessie Alexander
65092,Theresa Partain
65093,Thomas Becwar
65094,Ella Hidinger
65095,Tricia Bowler
65096,Leigh Kelly
65097,Joanne Matava
65098,Michael Hummel
65099,Ann Mills
65100,Lorene Taylor
65101,Alan Pinson
65102,Joseph Fleeger
65103,Luther Fitzgerald
65104,Morris Washington
65105,Jimmy Adamson
65106,George Du
65107,Floyd Neal
65108,Rebecca Faulkner
65109,Juan Dial
65110,Robert Gonzalez

70612,Joyce Pritchard
70613,James Hanson
70614,Ella Hall
70615,Lyn Ray
70616,Barbara Runyon
70617,Lawrence Sharpless
70618,Jason Schriner
70619,Aaron Roussos
70620,Dorothy Bell
70621,Samuel Erwin
70622,Joy Bower
70623,Shanna Collier
70624,Kenneth Diaz
70625,Betty Drain
70626,Jon Sloan
70627,Elizabeth Bundy
70628,Carl Benthall
70629,Carlton Blair
70630,Ida Jones
70631,Rhonda Piazza
70632,Frank Hall
70633,Eileen Harper
70634,Claudia Reyes
70635,Paul Ortiz
70636,Amanda Villalta
70637,Chris Littleton
70638,Jacqueline Green
70639,Antonio Curd
70640,Kevin Perri
70641,Judith Burgess
70642,Rafael Johnston
70643,Katherine Slattery
70644,Scott Delaney
70645,Melissa Vargas
70646,Lola Delo
70647,Kimberly Montano
70648,Tony Hertel
70649,Amelia John
70650,Andrea Bailey
70651,Courtney Lester
70652,Erik Anderson
70653,Caroline Edwards
70654,Jean Nelson
70655,Cristy Evans
70656,Denise Siegmund
70657,William Cannon
70658,Ryan Parker
70659,Kevin Backus
7066

75514,Catherine Pierce
75515,Mark Burns
75516,Glenna Wilson
75517,George Badman
75518,Cecilia Clyburn
75519,Elizabeth Orta
75520,Norma Lew
75521,Fred Edmonds
75522,Julia Wheeler
75523,Shalon Fairman
75524,Dwight Hildebrand
75525,Christine Ford
75526,Jerald Rauth
75527,Jimmy White
75528,Jeffery Catalano
75529,Edward Schleider
75530,Elizabeth Malone
75531,Daniel Hicks
75532,Tracy Roswick
75533,Mildred Parker
75534,Francisco Hill
75535,Karen Neff
75536,Delilah Crouch
75537,Kristina Cooper
75538,Marilyn Webster
75539,Pamela Bailey
75540,Julissa Bellocchio
75541,Zachary Archibald
75542,Janet Lopes
75543,Kelly Kieffer
75544,Stuart Figueroa
75545,Dora Hickman
75546,Karen Johnson
75547,Martha Murray
75548,Tony Wyrick
75549,Joe Hart
75550,Ramon Brock
75551,June Kirby
75552,Fred Gilmore
75553,Peggy Hudson
75554,Mary Cushman
75555,Cornelia Mcdaniel
75556,Arthur Huguenin
75557,Nicole Fugitt
75558,Kelly Mcginley
75559,Robert Brown
75560,Tommy Givens
75

77545,Richard Franklin
77546,Beatrice Wilder
77547,Vera Brandt
77548,Judy Matthews
77549,Ella Coppa
77550,David Baca
77551,Collette Peters
77552,Lina Gardiner
77553,Benjamin Adams
77554,Paulette Nev
77555,Ashley Vaughn
77556,Irene Courtney
77557,Hattie Bledsoe
77558,Gerald Ruehle
77559,Terrell Ford
77560,Kenneth Jasso
77561,Jessie Lenhardt
77562,Jacalyn Jensen
77563,Leland Martin
77564,David Farmer
77565,Kimberly Prothro
77566,Felicia Wallace
77567,Rebecca Jordan
77568,Gerald Delbosque
77569,Kevin Clemmer
77570,Shawna Larson
77571,Jerry Chaisson
77572,Chanda Truett
77573,Joi Wheatley
77574,Norma Gauthier
77575,Mark Morris
77576,Terrence Drake
77577,Jeremy Dennison
77578,Ashley Thompson
77579,Joseph Minor
77580,Tammy Bodreau
77581,Heather Collins
77582,Angelina Binger
77583,Jessica Klocke
77584,Louis Mckinney
77585,Cedric Mcgough
77586,Jordan Barnes
77587,Wanda Haynes
77588,Kelvin Wilson
77589,Sean Watkins
77590,Frances Sauers
77591,Ashley D

78755,Emma Montgomery
78756,Robert Woods
78757,Harry Gallup
78758,Ethel Davis
78759,Cynthia Kleckner
78760,Ella Peterson
78761,Erin Marsden
78762,James Burhans
78763,Catherine Sheets
78764,Laura Coleman
78765,Richard Huddle
78766,Sarah Cook
78767,Franklin Bertoni
78768,Isaac Bumbrey
78769,Ruth Taylor
78770,Robert Malone
78771,George Medrano
78772,Sheri Israel
78773,Amber Grenier
78774,Grover Jackson
78775,Nancy Grant
78776,Marlene Aronow
78777,Albert Fuqua
78778,Debra Welsh
78779,Mary Paddock
78780,Carmella Loiacono
78781,Brenda Kartes
78782,Jimmy Staten
78783,Wanda Olivares
78784,Elizabeth Toth
78785,Michael Adair
78786,Tyler Cook
78787,Terry Champion
78788,Gail Ross
78789,Deborah Alfonso
78790,Terence Lopez
78791,Brandon To
78792,Jimmy Lee
78793,Alton Sparrow
78794,Melissa Williams
78795,John Hudson
78796,William Gray
78797,Stephen Henderson
78798,Patricia Bailey
78799,James Eyre
78800,Betty Barksdale
78801,Carol Marotta
78802,Francisca 

79672,Jonathan Savoy
79673,Carl Martin
79674,Joyce Lindsay
79675,Joan Byrd
79676,Donald Patten
79677,Brandi Martin
79678,Genevieve Carolina
79679,Bruce Clarke
79680,Alyssa Yin
79681,Lindsey Eckard
79682,Jeff Bowen
79683,Isaac Young
79684,Susan Drumm
79685,Richard Fallin
79686,Candy Feller
79687,Christine Kahle
79688,Courtney Betance
79689,John Chernosky
79690,Martin Williams
79691,Timothy Martin
79692,Dorothy Delargy
79693,Joseph Rodgers
79694,Mark Federico
79695,Ramona Parr
79696,Grace Riles
79697,Cynthia Davalos
79698,Jason Hardy
79699,Steven Rothfuss
79700,Larissa Noon
79701,Christopher Mitchell
79702,Ashley Doyle
79703,Aaron Sipriano
79704,Gary Armstrong
79705,Leslie Jensen
79706,Jean Campbell
79707,Edith Kruse
79708,Susan Mckinney
79709,Chris Shannon
79710,Marion Race
79711,Elaine Meritt
79712,Ronald Davidson
79713,Charlie Edwards
79714,Elizabeth Roblow
79715,James Luevano
79716,Anita Stephens
79717,Jack Metzler
79718,Michael Allen
79

80886,Jessica Castillo
80887,Myron Rodeheaver
80888,Charity Hunter
80889,Shelly Raso
80890,Robert Blevins
80891,Steven Bailey
80892,Lisa Hagar
80893,Mary Sinn
80894,Joseph Adams
80895,Jeffrey Stanford
80896,Julie Culver
80897,Tempie Pisano
80898,Adrienne Wilson
80899,Angie Turner
80900,Carol Heath
80901,Sandy Mize
80902,William Deruyter
80903,Heidi Petrson
80904,Kimberly Waites
80905,David Piper
80906,Alfred Luciano
80907,Barbara Mccartney
80908,Mary Martinez
80909,Cynthia Harris
80910,Mildred Laumbach
80911,Steven Macias
80912,Audrey Prokop
80913,Travis Jones
80914,Marie Whittenberg
80915,Robert Kinney
80916,Elida Dyer
80917,Mary Cobb
80918,Martha Cruz
80919,Robert Hanson
80920,Angela Hagan
80921,Anthony Blass
80922,Demetrius Montoya
80923,Tara Reed
80924,Charles Simone
80925,Noble Bird
80926,Jacqueline Elrod
80927,Roy Coppock
80928,Sid Graver
80929,David Fikes
80930,Ruby Prey
80931,Ella Matsuno
80932,Mary Jackson
80933,Larry Johnson
809

83377,Leon Holmes
83378,Will Mclane
83379,Timothy Tagliarini
83380,Maurice Barnes
83381,Mary Smith
83382,Carlos Floyd
83383,Mark Valentine
83384,Brandon Taylor
83385,Barry Bartz
83386,June Gallardo
83387,Andrew Hart
83388,Sharon Peterman
83389,Millard Garcia
83390,Gertrude Hays
83391,Marie Lopez
83392,Paul Bard
83393,Ivan Ballard
83394,David Flowers
83395,Jessica Watkins
83396,Julie Long
83397,Dorethea Depauw
83398,Alvin Mcbride
83399,Gary Miller
83400,Vernon Strong
83401,Travis Plaskett
83402,Robert Rogers
83403,Lee Hunt
83404,Scott German
83405,Shawn Cooper
83406,Jayme Boose
83407,Sara Pecoraro
83408,Alex Dickerson
83409,Juana King
83410,John Welsh
83411,Bryan Hopkins
83412,Lillie Lawson
83413,Tai Mitchell
83414,Arthur Mckinney
83415,Arthur Moreno
83416,Tammy Fuhr
83417,Eduardo Lowe
83418,Glenda Jackson
83419,Albert Jackson
83420,Charles Hildebrand
83421,Crystal Hitzeman
83422,Nicholas Butterworth
83423,Christina Batchelor
83424,Karen Bu

85149,Kathleen Ridenour
85150,Johnny Bunton
85151,Wilbur Klitz
85152,Beverly Frank
85153,Don Keeling
85154,Walter Ou
85155,Jeremy Sanchez
85156,Robert Fisher
85157,Karen Rapp
85158,Annette Ashbrook
85159,Leonard Murphy
85160,James Kam
85161,John Wilkes
85162,Michael Warren
85163,John Do
85164,Latoya Vargas
85165,Harry Porter
85166,Marie Stange
85167,Deborah Sullivan
85168,Katherine Cherry
85169,Debra Cohen
85170,Teresa Harris
85171,Theodore Shaw
85172,Melva Morein
85173,Dana Sica
85174,Morgan Seifert
85175,Ethel Christman
85176,Kent Smith
85177,George Smith
85178,Frank Leray
85179,Debra Duenas
85180,Gerald Brown
85181,Stacey Davis
85182,Barbara Barrette
85183,Monica Brinkman
85184,Jesse Loving
85185,Teresa Abbott
85186,John Boedeker
85187,Betty Mcgurk
85188,Vickie Poche
85189,Marjorie Sterrett
85190,Bradley Thomas
85191,Helen Murphy
85192,Brian Gabbard
85193,Steve Robertson
85194,Dale Graham
85195,Edward Jefferis
85196,Roy Barnes
85197,Ro

87373,Jennifer Pathak
87374,Thomas Norris
87375,Clifford Lindquist
87376,Joanne Cressey
87377,Brenda Chacon
87378,Andrew Adami
87379,Billy Rosario
87380,Becky Ray
87381,Marjorie Peters
87382,Cheryl Toombs
87383,Riley Craig
87384,Tom Bishop
87385,Gregorio Frank
87386,Theresa Elliott
87387,Mary Desjardins
87388,Larry Baldwin
87389,Melvin Sosa
87390,Taylor Starling
87391,Craig Sowers
87392,Madge Lynch
87393,James Craig
87394,Ignacio Johnson
87395,Steven Eike
87396,Yolanda Kamnik
87397,Steven Kaczmarek
87398,Edith Wise
87399,John Mckee
87400,Ruth Moore
87401,Christopher Pilbin
87402,Shawn Armstrong
87403,Catherine Couch
87404,William Martin
87405,Regina Frisina
87406,Connie Chee
87407,Merry Mayfield
87408,Thomas Hanchett
87409,John Harrel
87410,Leonard Smith
87411,Lisa Denning
87412,James Dye
87413,Angela Peterson
87414,Mary Lewis
87415,Russell Jones
87416,Gilbert Hilliard
87417,Mildred Babb
87418,Wanda Smith
87419,Larry Helmer
87420,Ronnie Mu

89685,Roxanne Liggins
89686,Elouise Damico
89687,Richard Hutchins
89688,Bertha Goodman
89689,Micheal Cobbs
89690,Jerry Quay
89691,Kelly Robert
89692,Alma Watson
89693,Eric Smith
89694,Pamela Johnson
89695,Lisa Cole
89696,Iris Travelstead
89697,Catherine Finnefrock
89698,Alton Koehl
89699,Vanessa Whatley
89700,Karen Barreto
89701,Jimmy Boisen
89702,Jack Kanish
89703,Walter Long
89704,Linda Hobby
89705,Charlene Baca
89706,Michele Hussey
89707,Sierra Vagliardo
89708,Clayton Carroll
89709,Connie Woods
89710,Douglas Heitz
89711,George Gorman
89712,Frederick Walters
89713,Richard Franklin
89714,Mary Mack
89715,John Leonard
89716,Mario Noel
89717,Devin Williams
89718,Clifford Dunigan
89719,Peter Medellin
89720,James Keenan
89721,Vito Permenter
89722,Amy Norman
89723,Elizabeth Day
89724,Patricia Gilliam
89725,Glenda Big
89726,Daron Carroll
89727,Ok Reels
89728,Kenneth Sanders
89729,Gina Rogers
89730,Timothy Sowl
89731,David Johnson
89732,Howard Ka

92199,Rosa Barrera
92200,Albert Syers
92201,Richard Person
92202,Erick Johnson
92203,Jesse Bruno
92204,Patsy Buchanan
92205,Carmen Berryhill
92206,Rosemary Anthony
92207,Mary Taylor
92208,Genaro Schutz
92209,Hassan Head
92210,Steven Cotto
92211,Elizabeth Wise
92212,Paul Matthews
92213,Kristina Swasey
92214,Angelica Johnson
92215,Lindsay Hensley
92216,Carol Truett
92217,Carmen Clear
92218,Teodora Salomone
92219,Ginger Flynn
92220,Norman Gibson
92221,Wayne Sharp
92222,Leonard Beller
92223,Joseph Hartsook
92224,Tonya Reid
92225,Fern Tabb
92226,Noe French
92227,Christine Gilbert
92228,Lela Green
92229,Hal Baker
92230,James Slater
92231,Karen Gomez
92232,Susanne Ramsey
92233,Austin Lawrence
92234,Adam Polanco
92235,David Bishop
92236,Joseph Haro
92237,Jean Bell
92238,Christopher Landry
92239,Linda Davis
92240,Edmond Cook
92241,James Nedd
92242,Thomas Smith
92243,Richard Smith
92244,Ruth Fowler
92245,Jane Roby
92246,Lorena Gross
92247,Amy Towns

93687,Harry Pitts
93688,Margaret Banks
93689,Michael Roby
93690,Edward Jones
93691,David Mccurdy
93692,Maurice Brandon
93693,Kevin Doucette
93694,Dorothy Miller
93695,James Houston
93696,Mildred Lefkowitz
93697,Monica Luchesi
93698,Anna Gumm
93699,Kara Hollan
93700,Willie Smallman
93701,Orville Butcher
93702,Lynn Boyd
93703,Jin Lloyd
93704,Wayne Thomas
93705,Colleen Judkins
93706,Kenneth Moland
93707,Caitlin Barnard
93708,David Oneal
93709,Bonnie Kuhn
93710,Kevin Simmons
93711,Jeannette Patterson
93712,George Hatt
93713,Shawn Arnold
93714,Kara Fusco
93715,Kenny Kesselman
93716,Irene Shoun
93717,Crystal Karaffa
93718,Amanda Robertson
93719,Susan Roach
93720,Shannon Eads
93721,Karl Bransom
93722,Lucille Olson
93723,James Powell
93724,Peter Tartaglione
93725,Damon Hailey
93726,Hermine Cherry
93727,Sean Falk
93728,Joe Miles
93729,Jerome Agnello
93730,Stacee Dixon
93731,Tyrone Gold
93732,Anthony Martinez
93733,Sarah Osmond
93734,Brandon Travers

95431,John Herrera
95432,Lori Dolan
95433,Robert Bruce
95434,Raymond Thiessen
95435,Miguel Abadie
95436,Marianne Rathbone
95437,Jessie Carter
95438,Moises Hafer
95439,Robert Orman
95440,Jessica Osborne
95441,Thomas Everett
95442,Minnie Fisher
95443,John Ross
95444,Diane Fairfield
95445,Howard Hardy
95446,Nicole Miller
95447,John Polit
95448,Stephen Honeycutt
95449,Gregory Ellison
95450,Lakesha Mcbride
95451,Edna Mcclellan
95452,Adam Herrera
95453,Jody Herms
95454,Clara Griffin
95455,Elizabeth Harwood
95456,Laurence Howle
95457,Diane Salgado
95458,Jennifer Coker
95459,Eleonora Holder
95460,Joan Schultz
95461,William Yarbrough
95462,Nikki Pennebaker
95463,Evelyn Biller
95464,Megan Smith
95465,Kim Rupp
95466,Jerry Guzman
95467,Rosemary James
95468,Ina Robinson
95469,Grace Bradley
95470,Tracy Burroughs
95471,Marie Cummings
95472,Michael Berman
95473,Amy Tekautz
95474,Ashley Matthews
95475,Ryan Arnold
95476,Chelsea Hernandez
95477,John Williams

96176,Nancy Duprey
96177,Vickey Winston
96178,Santos Stewart
96179,Cynthia Gutierrez
96180,Troy Quilliams
96181,Dale Hoyt
96182,Dorothy Drew
96183,Kenneth Johnson
96184,Maria Vaux
96185,Kimberly Peterson
96186,Viola Weir
96187,Dante Smuin
96188,Mildred Amisano
96189,David Larry
96190,Timothy Fisher
96191,Jerome Caldwell
96192,Hannelore Taylor
96193,Connie Watson
96194,Donna Chi
96195,Donna Hamons
96196,Vicki Scully
96197,Nelson Jones
96198,Ronald Garcia
96199,Richard Sulc
96200,Robyn Locke
96201,Keith Vaughan
96202,Emily Brouillard
96203,Kendra Bailey
96204,Beulah Felton
96205,Shirlee Smith
96206,Christopher Drake
96207,Stephen Niles
96208,Minnie Jones
96209,Linda Muncy
96210,Jo Dibenedetto
96211,Scott Lindsey
96212,Marie Delaney
96213,Melissa Tuohy
96214,Christopher Brooks
96215,Gena Glover
96216,John Butler
96217,Sharon Kina
96218,Walter Smith
96219,Gordon Echard
96220,Patricia Ziech
96221,Bryan Zarate
96222,Jared Phillips
96223,Vaughn M

97938,Ernestine Holze
97939,Mindy Gonsalves
97940,Lee Holt
97941,Karen Jordan
97942,Danielle Rice
97943,Leslie Blankenship
97944,Nathan Larocque
97945,Heriberto Grove
97946,Lilian Wheeler
97947,Joan Cook
97948,Robert Conrad
97949,Anthony Chandler
97950,Allen Freeman
97951,Conrad Chaplin
97952,Hassan Emmanuel
97953,Timothy Martinie
97954,Amber Botts
97955,Tony Moore
97956,Thomas Watt
97957,Barbara Stoll
97958,James Derry
97959,Anita Busbee
97960,Andrea King
97961,Albert Gross
97962,Myrle Swensen
97963,Scott Davis
97964,Jim Breaux
97965,Hector Snyder
97966,Carmen Girard
97967,Jason Parvin
97968,Benjamin Swanson
97969,John Hagerty
97970,Philip Leatham
97971,Carol Hugill
97972,Lura Acree
97973,Duane Conigliaro
97974,Michael Allison
97975,Terry Martinez
97976,Cleo Walker
97977,Fred Turnbow
97978,Jacki Schaefer
97979,Mark Henry
97980,Tanya Labrecque
97981,Terri Gibson
97982,Daniel Rogers
97983,Rose Smith
97984,Lucy Reed
97985,Winifred Christiano

98952,Doyle Koenen
98953,Robert Bailey
98954,Frances Peterson
98955,Denise Ghosh
98956,Charles Campbell
98957,Magali Washington
98958,Willis Jones
98959,Christopher Diana
98960,John Bejaran
98961,Patsy Overman
98962,Amanda Stotler
98963,Juan Cade
98964,Juan Leko
98965,Christopher Brubaker
98966,Connie Briggs
98967,Leslie Nelson
98968,Joseph Zimmerman
98969,Belen Leyba
98970,Tiffany Mackiewicz
98971,Paul Bright
98972,Julie Kenney
98973,Heather Williams
98974,Angela Vanderford
98975,Al Baylor
98976,James Deitz
98977,Luis Johnson
98978,Stacy Silvio
98979,David Cullen
98980,Nathan Deveau
98981,Megan Stone
98982,Carolyn Christin
98983,Juan Nevarez
98984,Patricia Purnell
98985,Kathy Cruz
98986,Jane Hall
98987,Jacqueline Null
98988,Jorge Audette
98989,James Venecia
98990,Juanita Williams
98991,Vicky Maedche
98992,Bob Gladney
98993,Nicole Williams
98994,Juan Jones
98995,Lois Long
98996,Marie Pennell
98997,James Sittre
98998,Merideth Davis
98999,Ca

99975,Frances Williams
99976,Rusty Davis
99977,Paul Nelson
99978,Diane Stillwell
99979,Max Delgado
99980,Daisy Garza
99981,Shawn Webb
99982,Martha Vargas
99983,Steven Wainwright
99984,Patricia Byers
99985,Devin Mcqueen
99986,Elaine Kamiya
99987,Gloria Mccray
99988,Fern Morris
99989,Kristina Musser
99990,Della Richards
99991,William Swilley
99992,Adam Reynolds
99993,Cheryl Rodriguez
99994,Virgil Hockman
99995,Milton Dupuis
99996,Mary Weatherby
99997,Sarah Clark
99998,Frances Scott
99999,Anita Walters
100000,Nicholas Wyatt
100001,Ronald Duncan
100002,Catherine Epps
100003,Matthew Lopez
100004,Pamela Collister
100005,William Millerbernd
100006,Ronald Hernandez
100007,Fernando Klouda
100008,Ryan Lovig
100009,Myrtle Roman
100010,Gilberto Hopkins
100011,Guadalupe Schneider
100012,Hazel Crowe
100013,Frank Clark
100014,Rita Esparza
100015,Michael Immediato
100016,Fernando Lamb
100017,Earl Oleary
100018,Abraham Baney
100019,Lisa Ward
100020,Larry Col

In [20]:
## Inferencing schema take time as it has to go through data to determine the data
## type for each column hence instead of inferencing schema, a predifined schema is used

## Define schema
schema = StructType([
  # Define a StructField for each field
  StructField('person_id', IntegerType(), False),
    StructField('name', StringType(), False)
])

file_path2 = 'file:////home/talentum/test-jupyter/P3/M1/SM1/test.csv'
## Use schema to load data
df2 = spark.read.csv(file_path2, schema=schema)
df2.show()

+---------+-----------------+
|person_id|             name|
+---------+-----------------+
|      100|   Penelope Lewis|
|      101|    David Anthony|
|      102|        Ida Shipp|
|      103|     Joanna Moore|
|      104|   Lisandra Ortiz|
|      105|    David Simmons|
|      106|    Edward Hudson|
|      107|     Albert Jones|
|      108| Leonard Cavender|
|      109|   Everett Vadala|
|      110| Freddie Claridge|
|      111|Annabelle Rosseau|
|      112|    Eulah Emanuel|
|      113|       Shaun Love|
|      114|Alejandro Brennan|
|      115|Robert Mcreynolds|
|      116|   Carla Spickard|
|      117|Florence Eberhart|
|      118|     Tina Gaskins|
|      119| Florence Mulhern|
+---------+-----------------+
only showing top 20 rows

