Skip to content

Commit

Permalink
Final version for mini project - just cutting our losses...
Browse files Browse the repository at this point in the history
  • Loading branch information
Bob Flagg committed Apr 23, 2013
1 parent d484bce commit d7bd6d2
Show file tree
Hide file tree
Showing 9 changed files with 46 additions and 3 deletions.
Binary file modified doc/density.pdf
Binary file not shown.
Binary file modified doc/dtn.pdf
Binary file not shown.
Binary file modified doc/mostfreq.pdf
Binary file not shown.
Binary file modified doc/overlap-network.pdf
Binary file not shown.
Binary file modified doc/topic-network.pdf
Binary file not shown.
Binary file modified doc/topic-networks.pdf
Binary file not shown.
39 changes: 37 additions & 2 deletions doc/topic-networks.tex
Expand Up @@ -55,7 +55,8 @@

\title[Topic Networks]{
Topic Networks \\
{\Large A mini project for Social Network Analysis}
{\Large A mini project for Social Network Analysis} \\
{\large Final Version}
}
%\author[Bob Flagg]{Bob Flagg}

Expand Down Expand Up @@ -300,7 +301,8 @@ \subsection{The Topic Network}

\begin{verbatim}
# Create the topic network:
topic.network.matrix <- t(dt.matrix) %*% dt.matrix
topic.network.matrix <- t(dt.matrix) %*%
dt.matrix
topic.network <- graph.adjacency(
topic.network.matrix, mode = "undirected")
\end{verbatim}
Expand Down Expand Up @@ -337,6 +339,39 @@ \subsection{The Topic Network}
\label{fig:topic-network}
\end{figure}

As Figure~\ref{fig:topic-network} shows, the topic network is very busy, suggesting
that we've not done a good job choosing the most import topics for each document.
In fact, as the code below shows, the average degree, at $19.3$, is very high.

\begin{verbatim}
# Compute the average degree of our network:
degrees <- V(topic.network)$degree
topic.network.average.degree <- sum(degrees) /
length(V(topic.network))
\end{verbatim}

\noindent This problem can probably be fixed if rather than taking the top 4 topics for each
document in the document-topic network, we were to choose only those topics whose
probability of appearing in the document is above a certain threshold. We plan to
investigate that approach but don't have time to pursue it here.

Once we've solved the previous problem, we plan to systematically investigate the connections between important
properties of the topic network and the underlying corpus. As an example, consider
centrality:

\begin{verbatim}
# Betweenness centrality
V(topic.network)$btwcnt <- betweenness(topic.network)
V(topic.network)$label[order(V(topic.network)$btwcnt)]
\end{verbatim}

\noindent This measure suggests that the most central topics in the corpus are
\begin{itemize}
\item tonight, mccain, page
\item religion, bankruptcy, christian
\item oil, fuel, auto
\item iraq, iraqi, troop
\end{itemize}


\subsection{The Overlap Network}
Expand Down
Binary file modified doc/wordcloud.pdf
Binary file not shown.
10 changes: 9 additions & 1 deletion r/topic-networks.R
Expand Up @@ -25,6 +25,7 @@
# Set the path to the corpus directory. #
# ---------------------------------------------------------------------------- #
corpus.directory <- '/home/birksworks/Projects/topic-networks/corpus'
setwd('/home/birksworks/Projects/topic-networks/Topic-Networks/doc/')
# ---------------------------------------------------------------------------- #
# Load required libraries. #
# ---------------------------------------------------------------------------- #
Expand Down Expand Up @@ -125,8 +126,12 @@ V(dt.network)$size[1:n.docs] <- 2
V(dt.network)$size[(n.docs+1):n.vertices] <- 6
E(dt.network)$width <- .5
E(dt.network)$color <- rgb(.5,.5,0,.4)
tkplot(dt.network)
dt.network$layout <- tkplot.getcoords(1)

pdf("dtn.pdf")
plot(dt.network, layout=layout.fruchterman.reingold)
plot(dt.network)
dev.off()
png("dtn.png")
plot(dt.network, layout=layout.fruchterman.reingold)
Expand All @@ -149,6 +154,9 @@ V(topic.network)$color <- rgb(0,1,0,.6)
# Set edge gamma according to edge weight
egam <- (log(E(topic.network)$weight)+.3)/max(log(E(topic.network)$weight)+.3)
E(topic.network)$color <- rgb(.5,.5,0,egam)
tkplot(topic.network)
topic.network$layout <- tkplot.getcoords(2)

pdf("topic-network.pdf")
plot(topic.network, layout=layout.kamada.kawai)
dev.off()
Expand All @@ -175,7 +183,7 @@ overlap.network <- simplify(overlap.network, remove.multiple=FALSE, remove.loops
overlap.network$layout <- layout.kamada.kawai(overlap.network)
V(overlap.network)$label <- topic.labels
tkplot(overlap.network)
overlap.network$layout <- tkplot.getcoords(1)
overlap.network$layout <- tkplot.getcoords(3)

# Set vertex attributes
V(overlap.network)$label.color <- rgb(0,0,.2,.6)
Expand Down

0 comments on commit d7bd6d2

Please sign in to comment.