# UFO Analysis

## Abstract

The goal of this project is to explore the UFO data across US states and to get some meaningful insights.

For this project we will be working with **Kaggle's UFO Sightings** dataset- https://www.kaggle.com/NUFORC/ufo-sightings

Following questions will be answered-
* When and where do people see UFOs in California?
* What is the average length of UFO sightings across the U.S. states?
* Differences in the circumstances of UFO sightings across the U.S. states?

#### To enable multiple output

In [1]:
from IPython.core.interactiveshell import InteractiveShell
InteractiveShell.ast_node_interactivity = "all"

In [2]:
import pandas as pd

## Reading the UFO Data and Cleaning

In [3]:
df=pd.read_csv('scrubbed.csv', low_memory=False)
df.head(2)

Unnamed: 0,datetime,city,state,country,shape,duration (seconds),duration (hours/min),comments,date posted,latitude,longitude
0,10/10/1949 20:30,san marcos,tx,us,cylinder,2700,45 minutes,This event took place in early fall around 194...,4/27/2004,29.8830556,-97.941111
1,10/10/1949 21:00,lackland afb,tx,,light,7200,1-2 hrs,1949 Lackland AFB&#44 TX. Lights racing acros...,12/16/2005,29.38421,-98.581082


#### Column renaming to make it more user-friendly

In [4]:
df=df.rename(columns={'datetime':'sighted_on', 'duration (seconds)':'duration_s', 
                      'duration (hours/min)':'duration_hm','date posted':'posted_on'})

In [5]:
df=df.set_index('sighted_on')

#### Preprocessing data by transforming the columns 

In [6]:
df['city']=df['city'].str.title()
df['state']=df['state'].str.upper()
df['country']=df['country'].str.upper()

## Sighting in California at various times

In [7]:
Sighting_Cal=df[df['state']=='CA']
Sighting_Cal.head(5)

Unnamed: 0_level_0,city,state,country,shape,duration_s,duration_hm,comments,posted_on,latitude,longitude
sighted_on,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1
10/10/1968 13:00,Hawthorne,CA,US,circle,300,5 min.,ROUND &#44 ORANGE &#44 WITH WHAT I WOULD SAY W...,10/31/2003,33.9163889,-118.351667
10/10/1979 22:00,San Diego,CA,US,oval,180,3 minutes,My 2nd UFO sighting&#44 October 1979,8/5/2001,32.7152778,-117.156389
10/10/1989 00:00,Calabasas,CA,US,disk,300,approx 5 min,Unidentified object on Mulholland Highway.,12/14/2004,34.1577778,-118.6375
10/10/1995 22:40,Oakland,CA,US,,60,1 minute,Woman repts. bright light in NW sky&#44 sudde...,11/2/1999,37.8044444,-122.269722
10/10/1998 02:30,Hollywood,CA,US,changing,300,5 minutes,I was standing outside on Sunset Blvd. at Vine...,11/1/1998,34.0983333,-118.325833


## Average length of UFO sightings across the U.S. states

#### Transforming the duration data into numeric value 

In [8]:
df['duration_s'] = pd.to_numeric(df['duration_s'],errors='coerce')

#### Calculating the average sighting time

In [9]:
time=df['duration_s'].mean()
print("Average sighting time is %2f seconds" % time)

Average sighting time is 9017.225634 seconds


#### Average sighting duration in the format of Days:Hours:Minutes:Seconds

In [10]:
day = time // (24 * 3600)
time = time % (24 * 3600)
hour = time // 3600
time %= 3600
minutes = time // 60
time %= 60
seconds = time
print("d:h:m:s-> %d:%d:%d:%d" % (day, hour, minutes, seconds))

d:h:m:s-> 0:2:30:17


## Analyzing UFO Sighting Comments

#### Writing comments in a text file

In [11]:
import codecs
file = codecs.open("comments.txt", "w")
for i in df.comments:
    file.write(i)
    file.write("\n")
file.close() 

135

1

95

1

51

1

138

1

154

1

138

1

130

1

113

1

119

1

96

1

95

1

93

1

40

1

76

1

66

1

143

1

125

1

135

1

75

1

135

1

138

1

20

1

62

1

56

1

37

1

135

1

58

1

58

1

137

1

96

1

36

1

138

1

112

1

50

1

44

1

43

1

24

1

43

1

135

1

76

1

105

1

110

1

48

1

99

1

98

1

42

1

115

1

43

1

50

1

135

1

54

1

131

1

128

1

76

1

94

1

135

1

142

1

34

1

62

1

103

1

101

1

18

1

147

1

137

1

73

1

54

1

39

1

40

1

111

1

100

1

45

1

135

1

134

1

93

1

135

1

66

1

37

1

15

1

134

1

98

1

134

1

135

1

141

1

24

1

44

1

89

1

76

1

39

1

111

1

54

1

110

1

135

1

97

1

49

1

61

1

21

1

22

1

138

1

117

1

91

1

60

1

36

1

135

1

20

1

98

1

65

1

113

1

151

1

130

1

140

1

72

1

134

1

18

1

20

1

126

1

102

1

26

1

97

1

52

1

53

1

49

1

31

1

70

1

113

1

86

1

121

1

38

1

103

1

77

1

116

1

140

1

55

1

19

1

54

1

135

1

70

1

133

1

99

1

116

1

52

1

60

1

124

1

135

1

95

1

45

1

147

1

59

1

51

1

36

1

54

1

67

1

44

1

141

1

31

1

135

1

68

1

75

1

33

1

99

1

95

1

51

1

135

1

93

1

93

1

144

1

119

1

14

1

82

1

91

1

108

1

60

1

125

1

103

1

103

1

112

1

130

1

130

1

123

1

123

1

42

1

32

1

69

1

115

1

68

1

80

1

83

1

65

1

67

1

85

1

94

1

146

1

74

1

65

1

52

1

49

1

136

1

47

1

135

1

125

1

80

1

74

1

42

1

114

1

106

1

31

1

76

1

135

1

20

1

73

1

73

1

116

1

107

1

70

1

21

1

122

1

38

1

66

1

79

1

82

1

80

1

64

1

42

1

135

1

102

1

69

1

58

1

88

1

28

1

26

1

70

1

137

1

66

1

126

1

74

1

41

1

25

1

105

1

47

1

36

1

52

1

24

1

28

1

66

1

86

1

33

1

39

1

28

1

103

1

116

1

68

1

126

1

95

1

48

1

49

1

121

1

135

1

31

1

138

1

47

1

65

1

135

1

118

1

76

1

48

1

19

1

43

1

20

1

137

1

31

1

54

1

141

1

24

1

32

1

61

1

54

1

147

1

126

1

74

1

135

1

61

1

107

1

79

1

50

1

57

1

44

1

58

1

95

1

26

1

87

1

49

1

107

1

95

1

137

1

135

1

159

1

80

1

18

1

64

1

143

1

63

1

84

1

135

1

47

1

72

1

135

1

66

1

134

1

139

1

76

1

134

1

135

1

135

1

134

1

145

1

117

1

135

1

135

1

102

1

140

1

40

1

93

1

66

1

135

1

97

1

19

1

32

1

109

1

115

1

26

1

104

1

50

1

111

1

71

1

94

1

40

1

73

1

73

1

138

1

102

1

57

1

97

1

36

1

66

1

66

1

66

1

45

1

70

1

68

1

134

1

112

1

98

1

66

1

132

1

101

1

28

1

37

1

89

1

67

1

46

1

45

1

17

1

34

1

68

1

128

1

126

1

42

1

49

1

94

1

119

1

43

1

86

1

115

1

112

1

43

1

44

1

55

1

135

1

139

1

135

1

134

1

36

1

28

1

88

1

135

1

72

1

20

1

48

1

134

1

143

1

39

1

33

1

95

1

42

1

140

1

64

1

130

1

26

1

141

1

38

1

90

1

85

1

88

1

100

1

27

1

33

1

16

1

35

1

61

1

73

1

31

1

30

1

133

1

117

1

49

1

135

1

51

1

76

1

135

1

135

1

51

1

72

1

113

1

51

1

20

1

59

1

29

1

129

1

135

1

49

1

62

1

57

1

137

1

143

1

72

1

46

1

72

1

80

1

48

1

15

1

48

1

119

1

67

1

86

1

138

1

27

1

24

1

146

1

29

1

24

1

135

1

146

1

106

1

109

1

94

1

77

1

93

1

58

1

42

1

130

1

143

1

135

1

35

1

63

1

32

1

60

1

90

1

53

1

109

1

91

1

53

1

75

1

47

1

133

1

135

1

47

1

39

1

41

1

24

1

114

1

43

1

62

1

53

1

136

1

114

1

62

1

142

1

121

1

26

1

110

1

122

1

122

1

114

1

102

1

86

1

65

1

91

1

138

1

117

1

138

1

138

1

109

1

53

1

141

1

52

1

68

1

138

1

135

1

134

1

25

1

135

1

69

1

15

1

118

1

66

1

112

1

92

1

28

1

135

1

110

1

49

1

50

1

103

1

130

1

96

1

102

1

143

1

141

1

32

1

151

1

124

1

138

1

138

1

126

1

138

1

104

1

84

1

101

1

88

1

135

1

77

1

140

1

138

1

146

1

135

1

89

1

121

1

60

1

78

1

80

1

27

1

119

1

67

1

138

1

44

1

152

1

101

1

53

1

79

1

48

1

141

1

33

1

81

1

88

1

84

1

70

1

73

1

92

1

50

1

141

1

100

1

151

1

77

1

124

1

68

1

73

1

41

1

59

1

84

1

137

1

101

1

12

1

103

1

74

1

128

1

40

1

56

1

88

1

115

1

112

1

25

1

45

1

83

1

110

1

102

1

135

1

54

1

138

1

99

1

113

1

128

1

87

1

108

1

35

1

95

1

89

1

82

1

81

1

14

1

48

1

63

1

138

1

149

1

102

1

94

1

22

1

67

1

116

1

89

1

90

1

113

1

27

1

56

1

135

1

28

1

53

1

68

1

25

1

55

1

116

1

135

1

144

1

62

1

156

1

16

1

100

1

144

1

138

1

91

1

135

1

70

1

141

1

106

1

121

1

135

1

56

1

49

1

102

1

20

1

144

1

63

1

153

1

66

1

42

1

138

1

81

1

135

1

126

1

58

1

85

1

138

1

140

1

118

1

121

1

82

1

135

1

31

1

70

1

104

1

56

1

87

1

138

1

81

1

78

1

57

1

135

1

99

1

95

1

51

1

29

1

117

1

59

1

25

1

41

1

62

1

74

1

25

1

109

1

105

1

98

1

78

1

109

1

128

1

47

1

135

1

141

1

25

1

73

1

67

1

72

1

104

1

138

1

63

1

135

1

59

1

118

1

29

1

23

1

135

1

86

1

133

1

126

1

135

1

45

1

110

1

100

1

150

1

17

1

92

1

26

1

58

1

135

1

77

1

109

1

65

1

138

1

101

1

89

1

38

1

48

1

65

1

18

1

76

1

23

1

59

1

55

1

83

1

68

1

60

1

33

1

80

1

6

1

70

1

34

1

138

1

70

1

14

1

46

1

45

1

139

1

138

1

119

1

42

1

29

1

125

1

47

1

27

1

32

1

89

1

134

1

104

1

83

1

31

1

33

1

79

1

68

1

36

1

79

1

33

1

130

1

54

1

135

1

39

1

128

1

34

1

30

1

61

1

141

1

91

1

55

1

63

1

109

1

86

1

53

1

56

1

34

1

49

1

88

1

74

1

43

1

43

1

141

1

49

1

57

1

75

1

119

1

39

1

120

1

30

1

138

1

66

1

29

1

56

1

127

1

135

1

125

1

141

1

128

1

144

1

67

1

65

1

135

1

53

1

43

1

140

1

74

1

138

1

31

1

91

1

135

1

46

1

118

1

51

1

137

1

31

1

82

1

54

1

57

1

140

1

90

1

102

1

122

1

137

1

130

1

116

1

125

1

38

1

61

1

138

1

118

1

134

1

128

1

30

1

37

1

41

1

39

1

101

1

129

1

156

1

85

1

88

1

135

1

120

1

135

1

46

1

55

1

57

1

131

1

46

1

61

1

93

1

30

1

29

1

86

1

25

1

34

1

33

1

85

1

44

1

77

1

151

1

79

1

89

1

77

1

99

1

138

1

11

1

59

1

135

1

72

1

27

1

141

1

85

1

63

1

36

1

105

1

40

1

53

1

20

1

25

1

46

1

81

1

90

1

98

1

35

1

40

1

49

1

52

1

71

1

79

1

55

1

116

1

27

1

54

1

33

1

59

1

49

1

41

1

48

1

46

1

93

1

73

1

98

1

99

1

53

1

27

1

63

1

58

1

74

1

94

1

64

1

59

1

26

1

122

1

49

1

156

1

142

1

79

1

57

1

34

1

39

1

83

1

55

1

136

1

33

1

20

1

64

1

141

1

71

1

47

1

68

1

131

1

141

1

135

1

143

1

96

1

141

1

135

1

20

1

127

1

10

1

26

1

42

1

96

1

34

1

67

1

131

1

28

1

49

1

69

1

131

1

113

1

91

1

60

1

61

1

87

1

143

1

69

1

39

1

33

1

93

1

140

1

140

1

63

1

71

1

90

1

93

1

93

1

73

1

137

1

98

1

118

1

74

1

44

1

61

1

64

1

87

1

51

1

17

1

83

1

116

1

21

1

46

1

15

1

138

1

134

1

135

1

102

1

89

1

79

1

135

1

78

1

129

1

145

1

39

1

44

1

138

1

107

1

140

1

30

1

135

1

138

1

135

1

27

1

77

1

143

1

30

1

75

1

92

1

131

1

137

1

138

1

142

1

133

1

135

1

93

1

70

1

138

1

60

1

115

1

28

1

80

1

140

1

147

1

19

1

73

1

139

1

115

1

64

1

92

1

117

1

138

1

27

1

141

1

55

1

125

1

99

1

102

1

124

1

144

1

69

1

37

1

71

1

33

1

24

1

35

1

65

1

31

1

138

1

72

1

53

1

43

1

16

1

42

1

64

1

132

1

126

1

31

1

28

1

107

1

118

1

135

1

21

1

63

1

121

1

24

1

46

1

49

1

140

1

129

1

63

1

127

1

95

1

91

1

61

1

39

1

143

1

83

1

36

1

38

1

27

1

117

1

142

1

36

1

46

1

71

1

75

1

113

1

75

1

51

1

107

1

51

1

134

1

124

1

43

1

135

1

38

1

17

1

35

1

112

1

39

1

49

1

45

1

38

1

73

1

30

1

45

1

79

1

43

1

105

1

42

1

70

1

59

1

68

1

61

1

68

1

80

1

135

1

89

1

146

1

82

1

80

1

48

1

56

1

34

1

50

1

138

1

78

1

49

1

49

1

88

1

48

1

82

1

30

1

66

1

131

1

83

1

22

1

100

1

100

1

92

1

138

1

90

1

40

1

43

1

80

1

135

1

64

1

32

1

21

1

107

1

55

1

18

1

32

1

129

1

80

1

47

1

60

1

47

1

56

1

55

1

46

1

51

1

97

1

115

1

27

1

99

1

84

1

77

1

86

1

47

1

54

1

134

1

68

1

44

1

133

1

45

1

131

1

59

1

114

1

145

1

47

1

35

1

49

1

117

1

26

1

18

1

135

1

101

1

32

1

18

1

84

1

96

1

66

1

90

1

49

1

50

1

51

1

75

1

64

1

135

1

112

1

103

1

130

1

119

1

94

1

36

1

137

1

60

1

140

1

82

1

25

1

125

1

61

1

138

1

128

1

63

1

144

1

135

1

135

1

89

1

100

1

148

1

141

1

141

1

82

1

138

1

28

1

57

1

53

1

105

1

135

1

135

1

138

1

140

1

93

1

143

1

137

1

88

1

37

1

61

1

146

1

49

1

134

1

134

1

40

1

66

1

141

1

114

1

118

1

38

1

18

1

38

1

59

1

135

1

138

1

27

1

41

1

86

1

138

1

85

1

19

1

67

1

54

1

65

1

45

1

29

1

44

1

70

1

31

1

141

1

46

1

127

1

138

1

41

1

91

1

84

1

46

1

66

1

29

1

36

1

26

1

97

1

138

1

102

1

18

1

53

1

33

1

33

1

46

1

34

1

135

1

88

1

40

1

27

1

53

1

34

1

140

1

44

1

124

1

104

1

137

1

74

1

69

1

105

1

86

1

103

1

110

1

48

1

129

1

24

1

68

1

30

1

121

1

103

1

117

1

54

1

104

1

73

1

43

1

109

1

30

1

61

1

136

1

143

1

134

1

48

1

131

1

26

1

99

1

64

1

135

1

88

1

81

1

106

1

86

1

134

1

135

1

56

1

34

1

26

1

110

1

133

1

81

1

112

1

97

1

95

1

94

1

56

1

72

1

58

1

135

1

134

1

99

1

135

1

140

1

55

1

135

1

105

1

91

1

144

1

48

1

22

1

49

1

71

1

29

1

29

1

52

1

99

1

76

1

98

1

44

1

98

1

40

1

100

1

60

1

105

1

136

1

95

1

72

1

94

1

78

1

41

1

32

1

135

1

60

1

135

1

36

1

114

1

60

1

46

1

46

1

87

1

86

1

68

1

30

1

51

1

117

1

54

1

100

1

38

1

65

1

43

1

61

1

44

1

138

1

34

1

139

1

76

1

21

1

148

1

97

1

57

1

106

1

81

1

46

1

33

1

48

1

138

1

51

1

71

1

99

1

84

1

59

1

122

1

13

1

48

1

144

1

141

1

43

1

135

1

42

1

74

1

11

1

105

1

32

1

133

1

63

1

135

1

54

1

89

1

115

1

144

1

101

1

111

1

56

1

133

1

81

1

76

1

59

1

87

1

135

1

38

1

41

1

66

1

129

1

35

1

137

1

56

1

39

1

134

1

52

1

53

1

135

1

84

1

30

1

131

1

136

1

25

1

141

1

105

1

144

1

68

1

78

1

113

1

107

1

78

1

113

1

164

1

79

1

138

1

159

1

116

1

119

1

146

1

137

1

83

1

142

1

133

1

102

1

148

1

137

1

13

1

52

1

135

1

135

1

135

1

135

1

138

1

63

1

57

1

80

1

22

1

135

1

57

1

33

1

138

1

109

1

115

1

55

1

65

1

135

1

94

1

57

1

53

1

81

1

110

1

131

1

138

1

129

1

42

1

107

1

123

1

118

1

73

1

91

1

131

1

128

1

79

1

136

1

101

1

77

1

138

1

75

1

85

1

115

1

65

1

116

1

90

1

85

1

138

1

126

1

61

1

135

1

105

1

138

1

92

1

101

1

42

1

139

1

95

1

49

1

141

1

141

1

73

1

35

1

70

1

135

1

76

1

84

1

108

1

50

1

130

1

95

1

19

1

71

1

135

1

63

1

127

1

135

1

77

1

71

1

32

1

85

1

35

1

99

1

64

1

83

1

107

1

96

1

70

1

40

1

42

1

134

1

42

1

55

1

132

1

35

1

74

1

146

1

78

1

50

1

138

1

51

1

112

1

63

1

31

1

141

1

56

1

80

1

27

1

51

1

62

1

111

1

91

1

49

1

53

1

107

1

72

1

63

1

15

1

66

1

81

1

83

1

55

1

110

1

45

1

91

1

10

1

31

1

63

1

107

1

79

1

51

1

72

1

26

1

75

1

41

1

114

1

82

1

112

1

38

1

75

1

79

1

27

1

35

1

141

1

116

1

59

1

124

1

58

1

41

1

65

1

68

1

96

1

104

1

68

1

102

1

56

1

135

1

114

1

25

1

136

1

70

1

63

1

52

1

140

1

140

1

84

1

74

1

134

1

49

1

51

1

83

1

60

1

53

1

43

1

118

1

44

1

43

1

86

1

33

1

135

1

67

1

96

1

114

1

69

1

130

1

134

1

131

1

131

1

83

1

74

1

46

1

55

1

50

1

42

1

44

1

45

1

141

1

43

1

26

1

119

1

113

1

118

1

61

1

58

1

144

1

35

1

77

1

138

1

134

1

109

1

102

1

112

1

135

1

81

1

49

1

134

1

28

1

22

1

41

1

134

1

43

1

102

1

133

1

126

1

127

1

135

1

89

1

141

1

124

1

110

1

67

1

65

1

34

1

47

1

147

1

83

1

46

1

76

1

68

1

76

1

64

1

74

1

34

1

42

1

135

1

98

1

48

1

40

1

142

1

135

1

29

1

37

1

142

1

124

1

101

1

79

1

160

1

29

1

41

1

72

1

34

1

87

1

22

1

131

1

43

1

135

1

126

1

125

1

58

1

37

1

83

1

32

1

88

1

66

1

103

1

12

1

60

1

106

1

43

1

115

1

38

1

110

1

83

1

38

1

64

1

68

1

135

1

112

1

20

1

71

1

83

1

19

1

58

1

82

1

42

1

86

1

36

1

35

1

45

1

138

1

135

1

42

1

119

1

22

1

135

1

149

1

134

1

144

1

95

1

60

1

90

1

115

1

129

1

135

1

38

1

83

1

81

1

22

1

75

1

84

1

94

1

142

1

63

1

115

1

61

1

111

1

135

1

103

1

138

1

135

1

105

1

67

1

145

1

130

1

34

1

44

1

125

1

21

1

51

1

65

1

58

1

131

1

67

1

141

1

138

1

56

1

37

1

69

1

94

1

59

1

54

1

122

1

91

1

101

1

138

1

113

1

99

1

37

1

108

1

85

1

138

1

82

1

58

1

54

1

112

1

105

1

27

1

135

1

59

1

37

1

74

1

75

1

40

1

130

1

86

1

33

1

56

1

133

1

41

1

125

1

125

1

48

1

22

1

90

1

134

1

33

1

78

1

80

1

61

1

75

1

75

1

135

1

34

1

111

1

80

1

135

1

56

1

145

1

141

1

36

1

135

1

91

1

34

1

25

1

150

1

132

1

38

1

135

1

64

1

87

1

36

1

38

1

154

1

41

1

16

1

70

1

31

1

147

1

133

1

63

1

88

1

100

1

56

1

102

1

124

1

48

1

47

1

30

1

91

1

84

1

138

1

45

1

140

1

144

1

45

1

144

1

96

1

20

1

25

1

134

1

70

1

83

1

72

1

67

1

81

1

51

1

48

1

90

1

67

1

135

1

68

1

38

1

55

1

42

1

135

1

62

1

135

1

101

1

144

1

61

1

36

1

77

1

140

1

116

1

141

1

57

1

40

1

119

1

135

1

135

1

134

1

134

1

63

1

135

1

113

1

135

1

138

1

26

1

100

1

108

1

141

1

115

1

77

1

138

1

131

1

64

1

141

1

147

1

30

1

87

1

38

1

52

1

103

1

92

1

98

1

7

1

62

1

117

1

38

1

52

1

95

1

88

1

92

1

51

1

82

1

135

1

135

1

122

1

41

1

132

1

29

1

71

1

64

1

31

1

91

1

137

1

107

1

112

1

45

1

33

1

84

1

58

1

97

1

52

1

69

1

99

1

140

1

23

1

72

1

72

1

46

1

85

1

48

1

66

1

63

1

19

1

82

1

54

1

49

1

52

1

72

1

61

1

87

1

132

1

30

1

25

1

135

1

18

1

41

1

26

1

86

1

135

1

30

1

116

1

135

1

67

1

141

1

68

1

73

1

56

1

126

1

47

1

88

1

37

1

20

1

119

1

141

1

83

1

135

1

138

1

50

1

130

1

31

1

75

1

59

1

30

1

30

1

132

1

103

1

91

1

75

1

60

1

30

1

109

1

90

1

11

1

31

1

64

1

57

1

147

1

90

1

48

1

51

1

76

1

130

1

70

1

50

1

48

1

49

1

64

1

12

1

142

1

94

1

72

1

134

1

125

1

26

1

56

1

47

1

44

1

139

1

15

1

18

1

129

1

77

1

16

1

38

1

80

1

134

1

108

1

101

1

75

1

48

1

25

1

51

1

141

1

100

1

135

1

138

1

144

1

138

1

130

1

41

1

141

1

135

1

48

1

54

1

98

1

58

1

77

1

99

1

87

1

28

1

126

1

13

1

107

1

141

1

53

1

68

1

64

1

49

1

85

1

43

1

38

1

63

1

133

1

14

1

93

1

120

1

138

1

51

1

115

1

75

1

66

1

106

1

42

1

80

1

99

1

68

1

89

1

41

1

37

1

114

1

56

1

137

1

60

1

73

1

57

1

36

1

85

1

55

1

99

1

88

1

54

1

41

1

32

1

110

1

123

1

135

1

144

1

109

1

59

1

109

1

93

1

72

1

59

1

135

1

44

1

22

1

51

1

58

1

64

1

16

1

15

1

51

1

29

1

38

1

114

1

113

1

73

1

58

1

57

1

48

1

141

1

138

1

59

1

56

1

80

1

107

1

57

1

134

1

57

1

40

1

146

1

24

1

27

1

30

1

36

1

36

1

43

1

134

1

64

1

36

1

57

1

60

1

63

1

65

1

135

1

28

1

122

1

111

1

38

1

32

1

84

1

38

1

46

1

120

1

44

1

34

1

57

1

67

1

44

1

68

1

92

1

27

1

144

1

69

1

36

1

95

1

138

1

77

1

116

1

73

1

65

1

18

1

41

1

44

1

71

1

93

1

38

1

76

1

31

1

116

1

92

1

135

1

70

1

129

1

72

1

48

1

47

1

36

1

98

1

86

1

63

1

95

1

145

1

36

1

78

1

141

1

144

1

86

1

111

1

135

1

144

1

135

1

138

1

55

1

70

1

135

1

135

1

125

1

140

1

104

1

88

1

81

1

43

1

63

1

63

1

118

1

62

1

94

1

92

1

110

1

27

1

120

1

135

1

53

1

93

1

134

1

79

1

68

1

82

1

72

1

126

1

134

1

41

1

119

1

68

1

104

1

81

1

101

1

134

1

135

1

128

1

115

1

135

1

59

1

130

1

29

1

77

1

59

1

48

1

46

1

37

1

117

1

102

1

119

1

144

1

137

1

73

1

138

1

34

1

134

1

89

1

125

1

135

1

121

1

70

1

37

1

135

1

72

1

101

1

130

1

56

1

137

1

135

1

78

1

144

1

133

1

137

1

20

1

141

1

137

1

27

1

32

1

21

1

51

1

59

1

18

1

53

1

58

1

119

1

61

1

36

1

82

1

62

1

54

1

56

1

63

1

48

1

73

1

74

1

62

1

67

1

97

1

126

1

45

1

60

1

69

1

96

1

41

1

138

1

69

1

128

1

100

1

70

1

124

1

112

1

6

1

39

1

101

1

36

1

135

1

127

1

66

1

45

1

71

1

43

1

38

1

136

1

97

1

136

1

138

1

44

1

27

1

134

1

110

1

117

1

55

1

22

1

39

1

84

1

134

1

138

1

25

1

120

1

61

1

64

1

69

1

102

1

46

1

77

1

142

1

65

1

59

1

44

1

75

1

66

1

67

1

58

1

94

1

120

1

59

1

44

1

135

1

42

1

23

1

135

1

55

1

65

1

46

1

39

1

15

1

101

1

88

1

89

1

34

1

41

1

36

1

114

1

73

1

134

1

62

1

135

1

52

1

91

1

135

1

64

1

75

1

38

1

143

1

144

1

26

1

49

1

56

1

46

1

141

1

77

1

138

1

54

1

83

1

58

1

141

1

32

1

101

1

107

1

135

1

136

1

115

1

119

1

147

1

110

1

89

1

40

1

89

1

134

1

141

1

135

1

116

1

88

1

48

1

146

1

144

1

60

1

71

1

18

1

93

1

75

1

135

1

67

1

138

1

53

1

135

1

143

1

32

1

119

1

135

1

78

1

81

1

105

1

61

1

49

1

43

1

144

1

138

1

81

1

45

1

135

1

134

1

144

1

43

1

74

1

47

1

30

1

135

1

35

1

32

1

38

1

132

1

141

1

94

1

56

1

91

1

119

1

53

1

66

1

105

1

45

1

89

1

64

1

81

1

108

1

133

1

51

1

83

1

42

1

74

1

74

1

58

1

129

1

33

1

44

1

68

1

111

1

94

1

25

1

54

1

35

1

145

1

72

1

55

1

119

1

136

1

92

1

62

1

18

1

91

1

38

1

94

1

78

1

32

1

37

1

102

1

123

1

139

1

34

1

69

1

36

1

59

1

135

1

86

1

93

1

153

1

113

1

54

1

100

1

76

1

138

1

109

1

92

1

62

1

84

1

134

1

84

1

34

1

48

1

103

1

134

1

96

1

81

1

59

1

72

1

59

1

51

1

73

1

68

1

117

1

55

1

136

1

141

1

134

1

106

1

65

1

54

1

144

1

131

1

50

1

138

1

65

1

97

1

96

1

44

1

73

1

99

1

88

1

31

1

131

1

30

1

41

1

38

1

18

1

136

1

104

1

66

1

43

1

141

1

122

1

100

1

62

1

59

1

141

1

130

1

34

1

84

1

39

1

14

1

63

1

100

1

94

1

99

1

37

1

137

1

125

1

111

1

91

1

46

1

109

1

35

1

125

1

43

1

52

1

62

1

120

1

45

1

135

1

39

1

120

1

135

1

46

1

79

1

62

1

49

1

142

1

131

1

159

1

144

1

82

1

137

1

84

1

138

1

121

1

36

1

135

1

138

1

135

1

153

1

133

1

37

1

109

1

65

1

100

1

75

1

138

1

35

1

80

1

45

1

23

1

119

1

138

1

49

1

83

1

76

1

85

1

54

1

75

1

66

1

100

1

97

1

134

1

135

1

25

1

27

1

75

1

121

1

54

1

135

1

33

1

65

1

79

1

78

1

57

1

80

1

65

1

38

1

32

1

103

1

46

1

28

1

37

1

38

1

49

1

58

1

65

1

77

1

135

1

120

1

12

1

26

1

66

1

97

1

98

1

26

1

78

1

139

1

131

1

122

1

119

1

100

1

144

1

30

1

28

1

143

1

53

1

97

1

137

1

55

1

139

1

37

1

75

1

46

1

101

1

50

1

134

1

36

1

89

1

40

1

74

1

108

1

47

1

48

1

99

1

19

1

112

1

138

1

23

1

59

1

130

1

31

1

64

1

21

1

76

1

88

1

35

1

43

1

135

1

21

1

127

1

28

1

108

1

34

1

57

1

64

1

143

1

57

1

96

1

65

1

91

1

50

1

111

1

23

1

113

1

97

1

55

1

47

1

69

1

72

1

41

1

136

1

137

1

112

1

141

1

75

1

73

1

52

1

105

1

139

1

88

1

83

1

109

1

135

1

116

1

72

1

57

1

56

1

138

1

63

1

97

1

135

1

139

1

28

1

35

1

87

1

135

1

59

1

115

1

26

1

38

1

92

1

86

1

99

1

57

1

82

1

120

1

138

1

31

1

50

1

36

1

135

1

101

1

49

1

45

1

45

1

115

1

134

1

135

1

141

1

35

1

138

1

63

1

52

1

63

1

145

1

140

1

144

1

86

1

137

1

135

1

26

1

116

1

141

1

147

1

67

1

135

1

138

1

134

1

33

1

132

1

53

1

135

1

138

1

149

1

137

1

138

1

131

1

36

1

73

1

60

1

41

1

65

1

90

1

41

1

141

1

39

1

34

1

43

1

134

1

137

1

47

1

38

1

58

1

36

1

96

1

113

1

47

1

60

1

104

1

93

1

101

1

47

1

136

1

60

1

77

1

28

1

23

1

58

1

81

1

141

1

55

1

TypeError: write() argument must be str, not float

#### Now we have generated a file containing comments for the further analysis.

## Pre-processing Comments

#### Importing required packages

In [12]:
import nltk
from nltk.corpus import stopwords # Import the stop word list
nltk.download("stopwords")
import sklearn 
import collections, re

[nltk_data] Downloading package stopwords to
[nltk_data]     C:\Users\VershaDeota\AppData\Roaming\nltk_data...
[nltk_data]   Package stopwords is already up-to-date!


True

#### Cleaning the comments using regular expressions

In [13]:
def clean_comments(comment):
    #Convert to lower case
    comment = comment.lower()
    #Remove additional white spaces
    comment = re.sub('[\s]+', ' ', comment)
    #Replace #word with word
    comment = re.sub(r'#([^\s]+)', r'\1', comment)
    #Replace all non alphanumeric i.e remove all special characters
    comment=re.sub(r"\d*$", "", comment)
    comment=re.sub(r'[^\w]', ' ', comment)
    comment =re.sub("[.]", "", comment)
    #trim
    comment = comment.strip('\'"')
    #comment= comment.strip()
    comments_list=comment.split()
    return comments_list


#### Creating stopwords, positive and negative list of words

In [14]:
stop_words = list(stopwords.words('english'))

positive_words=open('PositiveWords.txt','r').read().split("\n")

negative_words=open('NegativeWords.txt','r').read().split("\n")

#### Cleaning the comment file and adding it in the list

In [15]:
import inflection  # it's a string transformation library.
final_comments=[]
#Read the file line by line
fp = open('comments.txt', 'r',encoding="utf8")
line = fp.readline()

while line:
    line = fp.readline()

    #Add the cleaned tweets to list
    clean_comment=clean_comments(line)
    
    #removing stopwords from the clean_tweets list
    for i in clean_comment:
        if i in stop_words:
            continue
        if i in positive_words:
            continue
        if i in negative_words:
            continue
        if len(i)<3:
            continue
        else:
            final_comments.append(inflection.singularize(i))    #To convert plural to singular word, like lights->light        
#end loop
fp.close()

## Finding the 10 most frequent words from comments

In [16]:
fdist = nltk.FreqDist(final_comments)
twenty_most_common=fdist.most_common(10)
twenty_most_common

[('light', 1285),
 ('sky', 595),
 ('white', 303),
 ('moving', 292),
 ('shaped', 282),
 ('orange', 275),
 ('red', 264),
 ('saw', 241),
 ('craft', 216),
 ('ufo', 213)]

#### Based on the word frequency in comments we can list the circumstances of UFO sightings across the U.S. states-

* People were mostly commenting about the lights they saw in the sky, of various colors mostly white, orange and red. 
* The object was moving. 
* People have commented about the different shape of the craft and UFO.